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Preface 


Probabilistic automata have been studied in the literature from different 
(although related) points of views. An approach emerging from information 
theory was initiated by Shannon and Weaver in their classical book as early as 
1948. Later, in 1958, a (somewhat vague) definition of a probabilistic automata 
was given by Ashby in a semipopular book. The theory began its real develop- 
ment only in the early sixties when scientists from different parts of the world 
introduced probabilistic automata as a natural generalization for deterministic 
automata of different types. 

Almost every book on automata theory published in the past few years con- 
tains some parts devoted to probabilistic automata (see, e.g., Harrison, 1965; 
Booth, 1967; Salomaa, 1969; Starke, 1969; Arbib, 1969; Carlyle, 1969). This 
seems to prove that there is growing interest in this new and fast developing area 
of research. This is a first attempt to devote a book to probabilistic automata 
and related topics, an attempt based on the assumption that the theory consid- 
ered is already mature enough to deserve a book of its own. 

The book is intended to serve both as a monograph and as a textbook and, 
as such, is augmented with a large collection of exercises distributed among the 
various sections. Some exercises are necessary for understanding the follow- 
ing sections; others, which the author considers to be hard, are marked with 
anasterisk. For the convenience of the reader, a section containing answers 
and hints to selected exercises is given at the end of the book. A collection of 
open problems as well as an exhaustive bibliography are included for the benefit 
of those readers who may wish to continue research in the area. 

The choice of topics presented and their extent is, of course, subjective, and 
the author wishes to express his apologies to those who may feel that their work 
has not been covered thoroughly enough (after all, a first book in a new area 
is a first trial in a sequence of trials and errors). 

The book emerged from a two-quarter course given during two consecutive 
years at the Department of Electrical Engineering and Computer Sciences, 
University of California, Berkeley. Some parts of the book have also been 
presented in a course given at the Department of Mathematics, Technion, 
Haifa, Israel. While the first chapter of the book is engineering oriented, the 
other two chapters are mathematically oriented. The interdependency between 
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the two parts is weak, and they can be presented separately and independently. 
Only some theorems in Section C of Chapter II depend on the first chapter. 
The only prerequisites assumed for being able to follow the material in this 
book are: finite automata theory, e.g., Harrison (1965), Booth (1967), Salomaa 
(1969), Arbib (1969); linear algebra and matrices, e.g., MacDufee (1964), Thrall 
and Tornheim (1957), Gantmacher (1959); elementary probability theory, e.g., 
Feller (1957), and some mathematical maturity. 
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A. NOTATIONS 


The following notations are used throughout unless otherwise stated: 

X or X denotes an input alphabet with individual elements (symbols) x and 
o respectively. Sequences of symbols X or È are called words or tapes and are 
denoted by u (when X is the alphabet) or x (when X is the alphabet). Y or A 
denotes an output alphabet with individual elements y or ø respectively. Words 
(or tapes) over Y or A are denoted by v or y respectively. The set of all words 
[including the empty word denoted by A or e] over X (or È or Y or A) is de- 
noted by X * (or E* or Y* or A* respectively). Subsets of words over a given 
alphabet are called events or languages and are denoted by U or V or E. If 
X —0,::: 0, and x' — 0, --- o; are words then xx’ is the word xx’ = 


0,::: 0,0, --- 0j and the operation is called concatenation (xÀ = Ax = x); 
k 


x* denotes the word xx---x. If U and U’ are languages, then UU' = 
(xx': x € U, x' e U'] and U* = (x*:x € U}. @ denotes the empty language 
[UG = ØU = gj]. Other set theoretic equations between languages are de- 
noted as usual. /(u) denotes the length of the word u [the number of symbols 
in the word u], (u,v) denotes a pair of words of the same length, u c X* 
and v € Y*; l(u,v) denotes the common length of u and v. If S is a set, 
then |S| denotes the number of elements in S. The brackets ( ) are used 
generally for enclosing an ordered set, the brackets ( } are used generally 
for enclosing an unordered set. The notation [a;] is used for denoting a 
matrix whose elements are a;. € = (€,) or €, = (€,,) denotes a vector č 
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or ¢; whose elements are C, or €,; respectively. The superscript T over a 
vector or a matrix [£7 or A47] denotes the transpose of the vector or the 
matrix. 4 denotes a column vector all the entries of which are equal to 1 
and whose dimension will depend on the context. A vector is called sub- 
stochastic if all its entries are nonnegative and the sum of its entries is <1; 
if the sum is — I, then the vector is called stochastic. The set of all n-dimen- 
sional stochastic vectors is denoted by 2. A matrix is called substochastic or 
stochastic if it is square and all its rows are substochastic or stochastic corre- 
spondingly. A matrix is called constant if all its rows are equal one to the other. 
The vector (0,...,1,..., 0), where the | is in the ith place and the dimension 
depends on the context, is called a degenerate stochastic vector and is denoted 
by the notation 5. The usual notation Pr(A|B) is used to denote the conditional 
probability of the event A given that B. If X,,..., X, are point vectors, then 
the combination >> A,X; is a convex combination of them if (A,) is a stochastic 
vector. The notation conv(Xi, ..., X,) stands for the convex closure of the set 
[X,,..., X,J. The set of point vectors (X,,... , X,} is linearly independent if the 
set of vectors (X, — Xi,..., X, — X,} is linearly independent. A simplex is a 
set of points which can be represented as the convex closure of a set of linearly 
independent point vectors. A set of points is convexly independent if no point 
in the set is a convex combination of the other points in the set. A convex 
polyhedron is a set of points which can be represented as the convex closure of 
a finite set of convexly independent points. If V is a convex polyhedron and 
W c V [W is a subset of V], then W is a face of V if the linear closure of W 
[notation: aff W] has no points in common with the convex closure of V — W 
[the set of points which are in V but not in W]. The interior of a convex 
polyhedron V (notation: int V] is the set of all points in V except the points 
on the faces of V which differ from V, the relative interior of V [notation: 
relint V] is the set of all points of V except its vertices. Two functions are 
equal if they have the same domain and agree on it. The term machine is used 
for devices which have both inputs and outputs, the term automaton is used for 
devices with input only [the output is represented directly by the internal 
states] and the term acceptor is used for automata, or machines which are used 
for descriminating between words over a given alphabet. Superscripts are used 
for descriminating between different machines (automata, acceptors) and are 
omitted if context is clear. 


B. SOME ANALYTICAL LEMMAS 


The following analytical lemmas are assumed. 


Lemma 2.1: Let {a,} and {b,} be nondecreasing sequences of real numbers such 
that a; < b, for all i with lim, ... a; = a, lim, ... b; = b [including the case where 
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a and/or b is equal to co). Then a < b. If a < b, then there is a natural 
number N such that for all j > N and all i the inequality a; < b, holds. 


Corollary 2.2: If a; < M for some real number M and all i, then also a < M. 
If a > M for some real number M, then there isa i, with a; > M for all i > ip. 


Lemma 2.3: Let {a,,,} be a double sequence, nondecreasing with regard to both 
m and n. Then lim,,.... lim, .. amn = lim,.... iMm- Amn [including the case 
where the limit has infinite value]. 


Definition: Let (a;) be a set of real numbers, sup, (a;) is defined as the number 
d such that a; < d for all i and for any € > 0 there is n such that a, > a — €; 
inf, (2;) is the number a such that a < a, for all i and for all € > 0 there is n 
such that a, < a + € [à or a can assume the values + oo or — co also]. 


Lemma 2.4: If (a;) and (b;) are two sets of numbers such that a; < b; for all i, 
then à < 5, a <b. Moreover if a; < M for some real number M and all i, then 
à < M and similarly if a; > M for all i, then a > M. 


The notation [[7:: a; stands for the infinite product of a sequence of numbers 
(a;) and is equal to lim, ... [[?-, a; [provided that the limit exists and including 
the case where the limit equals co]. 

The product [[zi;a; converges if there is m with a; > 0 for i > m and 
lim, ... [[7..,, a; exists and is finite. 


Lemma 2.5: Let (aj] be a sequence of numbers a <a; < 1. If S02, 4, di- 
verges, then [[2, (1 — a,) converges to zero for any j. 


Lemma 2.6: Let (a;) bea sequence of numbers, 0 < a; If $77; a; < co, then 
the product [][j2; (1 + 2;) converges. 


C. SOME ALGEBRAIC PRELIMINARIES 


The notation A => B is used for implication (“statement A implies statement 
B"]; A <> B means that statement A is equivalent to statement B; a € A means 
that a is an element of A and (a : A} stands for the set of all elements a satis- 
fying the property A. The Cartesian product of two sets A and B is defined as 
Ax B= ((a,b):a € A,b e B}. 

A (binary) relation between the sets 4 and B [including the case where 
B = A] isa subset of A x B. If R denotes a relation and (a, b) € R we shall 
denote this also by the notation aRb. A relation R between A and A (“over 
A") is 

1. Reflexive if aRa for every a € A. 

2. Symmetric if aRb => bRa. 

3. Transitive if aRb and bRc => aRc. 
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A relation satisfying all the three properties above is called an equivalence 
relation. 

Any equivalence relation R over a set A induces a partition of the set A into 
subsets A; such that 4, O 4,25 Ø if i = j, U 4; — A and aRb if and only if both 
a and b are in the same subset A, for some i. The subsets A, as above are called 
equivalence classes of R. If the number of different equivalence classes induced 
by a relation R over a set A is finite, then the relation R is of finite index. Let 
A be a set with an operation o: A x A — A and a relation R over A. 


1. R is right invariant if aRb implies that for any c, acc R bec. 

2. R is left invariant if aRb implies that for any c, coa R cob. 

3. R is a congruence relation if it is an equivalence relation and it is both 
left and right invariant. 


Let a, b, c be integers then a = b mod c [“a is congruent to b modulo c"] 
means that c is a factor of a — b. Congruence modulo an integer c has the 
following properties: 


Lemma 3.1: If a = b mod c and a! = b' mod c then a + a! = b + b' mod c 
and aa’ = bb’ mod c. 


We conclude this section with two lemmas concerning operations between 
infinite [countable] stochastic matrices. 


Lemma 3.2: The set of countable stochastic matrices is closed under matrix 
multiplication. 


Lemma 3.3: Multiplication of countable stochastic matrices is associative. 


D. PROBABLISTIC PRELIMINARIES 


Consider a physical experiment such as tossing a coin, matching a deck of 
cards, observing the life-span of radioactive atoms, etc. The set of all possible 
outcomes of such an experiment is called a sample space. The elements of a 
sample space are called sample points and aggregates of sample points or sub- 
sets of the sample space are called events. In what follows we shall concern 
ourselves only with finite or countable sample spaces [ie., sample spaces con- 
taining finitely many or at most a countable number of elements]. 

The set of all events over a sample space [including the empty set—to be 
denoted by @—and the whole space considered as an event—to be denoted by 
Q] is closed under countable intersection and union, and under complementa- 
tion with regard to Q. The set of all events as above with the operations of 
union, intersection, and complementation is sometimes called a c-algebra [see 
Feller (1966)]. 
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A probability measure p over a a-algebra .»/ as defined above is a function 
p from . into the interval [0, 1] of real numbers such that: 


1. p( A) > 0 is defined for all A in s». 

2. p(Q) — 1. 

3. If {A,}?, is a countable set of nonoverlapping or disjoint events in æ, then 
PUR A,} = bur P(A,). 

It is easy to show that (1)-(3) above imply: 


4. pæ) — 0. 
5. (Q — A) = 1 — p(Q). 


A random variable is a function from the sample space into the real numbers. 
Under the assumption that the sample space is at most countable, no restric- 
tion is placed on such a function. 


Example: 'The physical experiment: Tossing a coin 100 times. The sample 
space: All 2/9? possible outcomes. A sample point: The coin falls “heads” all 
the 100 times. An event: The coin falls *tails" for 50 consecutive times. A 
probability measure over Q: If c € Q is a sample point such that the coin 
falls heads m times and it falls tails 100 — m times then p(w) = p"q'?-^" where 
O0<p<10<q<1,p+q=1, pandgqare real numbers. If A is an event, 
then p(A) = Y,,., p(0). A random variable over Q: Let x(w) be the func- 
tion x(@) = the number of “heads” in the sample point c, then x(w) is a 
random variable. 

Given a c-algebra, a probability measure, and a random variable over it, a 
related distribution function from the real numbers to the interval [0, 1] is de- 
fined as follows: Let A, be the event A, = {@: x(@) < t}. Then the distribution 
function is the function F(t) = p(A,). Sometimes the notation p(x(@) < t) is 
used for p(A,), and the notation p(x(@) = t) is used for p(B,) where B, is the 
event {w : x(@) = t). 

Given a o-algebra and a probability measure over it, the conditional prob- 
ability p(A|B) (read: the probability of A given that B where A and B are 
events] is defined as p(A|B) = p( A r^ B)/p(B). The intuitive meaning of the 
above definition is as follows: If it is given that the event B occurred, then 
the sample space reduces to the points in B, and the event A reduces to the 
event A (^ B so that p( 4|B) is the proportion of the weight of the event AM B 
to the weight of the event B [usually p(A) is interpreted as the proportion of 
the weight of A to the weight of the whole space Q which is equal to 1]. 

Given two random variables x and y over a o-algebra and a probability 
measure over it, one can define the following function 


Logd man - Pm A y — u) R 
px = tly = u) =O (*) 
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where, x = t£ Ay = u is the event (0: x(m) = t ^ y(@) = u}. The random 
variables are said to be independent if 

px =t |y = u) = pix = t) 


i.e., the information that y(@) = u does not change the probability of x(c) = t. 
It follows from formula («) that if x and y are independent [or more generally, 
if A and B are independent events, i.e., p(A|B) = p(A)], then 


px =t A y =u) = px = tly = u): py = u) = p(x = t) ply = u) 
More generally, if 4 and B are independent events then 
p(A B) = p( 4): p(B) 


Let xo, x;,... be a sequence of random variables such that for any m 


B(x, = J|xo = ny x, = ny... p Xm- i) = pe, = jx, i) 


i.e., the random variable x,, depends on the random variable x,,., but not on 
the previous ones. Such a system is called a Markov chain. We shall consider 
only finite or countable Markov chains, ie., Markov chains over a sample 
space containing finitely many or a countable number of elements. 

Any Markov chain can be represented in the following model: The sample 
space is represented by a finite or a countable number of vertices: the random 
variable x, represents the position of a moving point at time t = i; p(x; = j) 
is the probability that the point will be at the vertex v; at time / =i and 
p(x, = j|Xm-1 = i) is the probability that the point will be at vertex v; at time 
t = m provided that it has been at vertex i at time t = m — 1. 

As the process is assumed to be Markov we have that 


P(Xs = jx. = i) T Ds = j|xo = ns... s Xm-1 = i) 


and we shall use, for the above probability, the notation ,,p;;. 

If mPij = ,p;; for any natural numbers m and n, then the Markov chain is 
called homogeneous and it is called nonhomogeneous otherwise. As the values 
mpi; are independent of m in the first case, we shall use the notation p;; for that 
case. It is tacitly assumed throughout that the Markov chains considered are 
discrete, i.e., the transitions from state to state occur at discrete intervals of 
time. 

The probabilities ,, p;; can be arranged in a matrix form and such a matrix 
is called stochastic or Markov. Clearly any Markov matrix [,,p;;] has the 
property that 0 < „p; < 1 and 35, p; = 1 which stems from the fact that 
system represented by the matrices [,, p;;] evolves in time and it must enter some 
state at time £ = m + 1 if it has been in state i at time m where the term 
“state” is used for denoting a point in the sample space. 
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Examples: 


1. Sequential deterministic machine with possible errors. 

2. A slot machine: the static position of the dials represent the states. In 
this case the „p;; are generally independent of m. 

3. Suppose some person is ill with probability p, [the probability of him 
being healthy is qo = 1 — po]. After swallowing a specific medicine he may 
change state [there are two states, representing illness and healthiness] the 
probability of the transition from state i to state j at time m being ,,p;; de- 
pending on the medicine swallowed at time m. 


EXERCISES 


1. Prove all tthe lemmas given without proof in the preceding sections. 


2. Prove that the set of n x n stochastic matrices are a monoid under matrix 
multiplication [i.e., the set of stochastic n x n matrices is closed under multi- 
plication and the unit n x n matrix is stochastic]. 


3. Let P(m) = [,,p:;] be the transition probabilities matrix at time m of a given 
Markov chain. Denote [[?_, P(i) = [pf7], prove that pf? is the probability that 
the process will go to state j beginning from state i after n steps. 


4. A stochastic matrix P is called constant if all its rows are equal. Prove: If 
P is constant stochastic and Q is stochastic [of the same order], then PQ is 
constant stochastic and QP — P. [Thus P? — P which means that stochastic 
constant matrices are idempotent.] 


5. Let P be a stochastic matrix such that there is an integer kọ with P** con- 
stant. Prove that in this case, for all m > ko, P" = P*. 


6. If z = (7) is a vector such that }> z; = t and P is a stochastic matrix [of 
the same order], then the sum of the entries of the vector zP is also equal to t. 


7. Prove: If P and Q are finite stochastic matrices such that PQ = J, then 
both P and Q are degenerate. [A stochastic matricx is degenerate if all its 
entries are either 0 or 1.] 


8. Prove, by an example, that Exercise 7 above is not true in the infinite case 
unless it is required that both P and Q have nonzero elements only. 


Chapter | 


Stochastic 
Sequential 
Machines 


INTRODUCTION 


In this chapter we introduce various mathematical models of stochastic se- 
quential machines (SSMs) and provide motivation for these models. Methods 
for synthesizing SSMs from their mathematical models are given. Various con- 
cepts of equivalence and coverings for SSMs are introduced and studied. Some 
decision problems and minimization-of-states problems induced by the above 
concepts are investigated and a procedure is formulated for constructing a mini- 
mal state SSM equivalent to a given one. The last part of this chapter in de- 
voted to stochastic input-output relations and their representatibility by SSMs. 


A. THE MODEL 


1. Definitions and Basic Relations 


Definition 1.1: A stochastic sequential machine (SSM) is a quadruple M — 
(S, X, Y, {A(y|x)}) where S, X, and Y are finite sets [the internal states, inputs, 
and outputs respectively], and {A(y|x)} is a finite set containing |X| x |Y] 
square matrices of order |S} such that a;,(y|x) > 0 for all i and j, and 
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15] 
p» a(ylx) 2-1 where A(y|x) = [4,,(y|x)] 

Interpretation: Let z be any |S|-dimensional vector. If the machine begins 
with an initial distribution zt over the state set S and is fed sequentially with a 
word u = x,:-x, it prints the word v = y,---y, and moves on to the 
next state. The transition is controlled by the matrices A(y|x) where a;,(y|x) 
is the probability of the machine going to state s; and printing the symbol y, 
given it had been in state s; and fed with the symbol x. 


Examples: 

a. Any deterministic sequential machine with faulty elements which may 
cause errors in transition from state to state is an SSM. 

b. Consider a psychological [or physical] experiment such that a sequence of 
stimuli [inputs] is applied to an animal [or to a physical system]. The system, 
assumed to have a finite number of possible internal states [which may or may 
not be observable], responds with a sequence of outputs and undergoes succes- 
sive changes of its internal state. Transition is generally not deterministic, nor 
is the relationship between inputs and outputs. 

c. A finite-state communication channel (Shannon, 1948) transmitting sym- 
bols from a source alphabet X, the symbols received belonging to an output 
alphabet Y. The channel may assume a finite number of states and is specified 
by a conditional probability function p(y, s;|s;, x), interpreted as the probability 
of the output symbol received being y and of the channel remaining in state 
S;, given the channel is in state s; and the input symbol x is transmitted. Such 
a communication channel is readily described by an SSM. 

d. Consider a situation where a pursuer is following a moving object (Zadeh, 
1963), with both capable of assuming a finite number of positions [states]. 
Assume also that the motion of the pursuer is characterized by a conditional 
probability distribution p;(x) (which denotes the probability of the pursuer 
moving to state j from state i on application of x) where x is one of several 
controls (inputs) available to the pursuer. As for the object, assume that it does 
not seek to evade the pursuer [the alternative case can be dealt with in a similar 
way] and that its motion is governed by a probability distribution q,; [which 
denotes the probability of the object moving to state / from state k]. The com- 
bined system can be described by an SSM with set of states S equal to that of all 
pairs (i, k) with i referring to the pursuer and k to the object; the set of inputs 
X is that of all controls available to the pursuer; the set of outputs is identified 
here with that of states, and the transition function is given by 


Gg, ui(X) = Pu(X) d 


[It is tacitly assumed that the random variables controlling the pursuer and 
the object are mutually independent.] In this setup the problem of the pursuer 
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is to find a minimal sequence of inputs which takes the composite system from 
its initial state to an “interception” state. 


Let M be an SSM. Let A(v|u) be defined as 
A(v|u) = [a,,(v|u)] = Ayd) Ayx) - - ACV Xe) (1) 
It follows from the interpretation of the values a,,(y|x) that a,,(v|u) is the 
probability of the machine going to state s, and printing the word v, having 
been in state s, and fed sequentially the word u. This assertion is clearly true 
for Kv, u) = 1, since in this case (v, u) = (y, x) for some y and x. Assuming 
now that the assertion is true for /(v, u) = k — 1, we have, by the notation (1) 
above, that 


a,(oylux) = X auolu)as (yix) (2) 


and the right-hand side of (2) is, by elementary rules of probability, the prob- 
ability of the machine going to state s, and printing the word vy, having been 
in state s; and fed sequentially the word ux. The assertion is thus proved true 
for any pair (v, u) with /(v, u) > 1 by induction. For (v, u) = I(A, A) = 0, we 
define A(A|A) = I, the |S|-dimensional unity matrix, meaning that with proba- 
bility 1 there is no change in the internal state of the machine and no output 
emerges if no input is fed. 


Notation: ņ denotes a column vector with all entries equal to 1, and with 
dimension equal to the number of states of the machine to which it is related. 
Definition 1.2: Given a machine M and an input-output pair of words (v, u), 
the vector 4(v|u) is defined as 

n(v|u) = Alvu) — GO) = In =n) (3) 

Interpretation: The ith entry in vector 4(v|u) consists in summation of all 
entries in the ith row of matrix A(v|u), and is therefore the probability of the 
machine printing the word v [and moving to some state], having been in state 
s; and fed the word u. 

It follows from (3) and (1) that 

n(vy|ux) = A(vylux)n = Allu) Ayn = AGlu)ynGyx) (4) 
Similarly, 

n(yo|xu) = ACylx) nvlu) (5) 
Definition 1.3: Let x be a [probabilistic] initial distribution vector over the 
states of a given machine M, and let (v, u) be any input-output pair of words. 

The vector z(v|u) and the function p,(v|u) are defined as 
n(v|u) = zA(v|u) ^ (m(A|A) = aT = 2) (6) 
p.vlu) = ayllu) — (— n(vlun) (7) 
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It follows by elementary rules of probability and from the interpretation of 
n(v|u) that p.(v|u) is the probability of the machine printing the word v when 
started with initial distribution z over its states and fed with the word u. Simi- 
larly, 7{v|u), the ith entry of the vector z(v|u), is the probability of the machine 
printing the word v and moving to state s; when started with initial distribution 
z over its states and fed the word u. The following equalities are easily verified: 
p.v vlu u) = nnwvu u) = nA vu u) 

= 1A(v, |u) Awu) = nolu) (8) 
Note that z(v|u) need not be a stochastic vector, as there may be several output 
words v, with positive probability, corresponding to a given input word v. Let 
it(v, u) be the vector whose ith entry is the probability of the machine moving 
to state s; given that the machine started with initial distribution z over its 
states, the input has been u, and the output v. It follows that 

T(V, u) - p(v|u) = zolu) (9) 
To prove this relation, we rewrite it in the form: Pr [final state s,joutput v, 
input u, initial distribution z] - Pr [output v|input u, initial distribution z] = 
Pr [final state s, output v|input u, initial distribution 7]. 

It follows from (9) that 

n(v|u)/pvlw) if pvlu) #0 
undefined otherwise 


T E | 


If p,(v|u) + 0, then Ē(v, u) is a probabistic vector; moreover, in this case we 
also have the relation 
pÁvvi|uu) = plu) Pao, lu) (10) 
since, using (8), (6), (7), and (9) we get 
p hvvi juu) = nz(v|u)yn(;]u;) 
z(v|u) 
a(v|u)n 
= n(v|u) (v, u) nlu) 
= p(v|u)psc svilui) 
as required. If p,(v|u) = 0, we define p,(vv,/uu,) = 0 for any input-output pair 
of words (v, u;). 
Example 1: Let M = (S, X, Y,{A(y|x)}) with X = (0, 1, Y = fa, b}, S = 
{s,, S2}, and 


= n(vug 4(v,|;) 


sao = | a AG) = [* ‘| 
2. 


san- fl aon =[* >] 


A. The Model 5 


and let z = (4 3) be an initial distribution for M. It is easily verified that 


1 


n(ab|00) = A(ab|00) n = E dm = () 


n(ab|00) = A(ab|00) = G 3) i | 2G) 


A(ab|00) = A(a|0) A(b|0) = k 4 


p.(ab|00) = z(ab|00)g = 1 
Rab, 00) = z(ab|00)/p.(ab|00) = ($ à 
Similarly, 
n(a|0) = ( $), p,(a|0) = $ 
£(a,0) = (49), Prva, o(5|0) = $ 
so that 
PL a\0) Pea, (b10) =4 $ = i d p.(ab|00) 
in accordance with (10). 

Note the difference between z(ab|00) and z(ab, 00). The first vector is not 
probabilistic, and the values in it are the probabilities of the machine entering 
the first (second) state and printing the output ab, given that the input is 00 
and the initial distribution is z. However, this input and initial distribution may 


also have other outputs (ba or bb or aa) with positive probability, In the vector 
Z(ab, 00), both the input and the output are assumed in advance. 


EXERCISES 


1. Let M be as in Example 1 and z = (0 1), an initial distribution for M. 


a. Find: A(vlu), sq(v|u), n(v|u), Tv, u), p.(v|u) with v = bb and u = 10. 
b. The same with v = ab, u = 10; in this case, compute also the value 
p,(aba|100). Discuss your results. 


2. Show that every deterministic sequential machine of the Mealy type can be 
represented as an SSM as given in Definition 1.1. 


3. Give an algorithm for recursive construction of any vector of the form z/(v|u) 
for a given machine. 


4. For a given machine M and a given initial distribution x, prove that the 
vector Y^, z(v|u) is probabilistic for any given input u (summation over all 
possible outputs v having the same length as 1). 


5. For the machine given in Example 1, find: 


a. the value q(b|001, ab) = the probability of the next output being b, given 
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that the input 00 resulted in the output ab and the input ] was fed next, 
b. the value r(5|001) = the probability of the final output being b, after the 
input 001 is fed. 


6. Give three reasons why the following quadruple is not an SSM: 
M = (S, X, Y,{AQ|x)}) with S = (s, s;] 
X—(01, Y={ah 
[i0 Fro 3 
aao =l 7, — 49" v 


E; 4i L 


Acalt) — | 2 A(b|1) = 
L2 4 


2. Moore, Mealy, and Other Types of SSMs 


In the preceding section, we described an SSM parallel to the Mealy-type de- 
terministic sequential machine. The Moore-type machine also has a stochastic 
version which will be described below. 


Definition 2.1: A Moore-type SSM is a quadruple M = (S, X, Y, {A(x}, A) 
where S, X, and Y are as in Definition 1.1, {A(x)} is a finite set containing |X| 
square stochastic matrices of order |S| and A a deterministic function from S 
into Y. 


Interpretation: In accordance with the interpretation following Definition 
1.1, the value a,,(x) [4(x) = [a;(x)]] is the probability of the machine moving 
from state s; to s; when fed the symbol x. When entering state s,, the machine 
prints the symbol A(s;) € Y. 

Let A(u) be defined as 


Alu) = [a.(u)] = A03) Ax) +++ Ao). AA) = D (11) 


It follows from the above interpretation that a,(u) is the probability of the 
machine moving from state s; to s; when fed the word u. [The proof of this 
assertion, along the same lines as for the corresponding assertion in the preced- 
ing section, is left to the reader.] The output word v depends on the sequence 
of states through which the machine passed when scanning the input word u. 
It is worth noting here that, as in the deterministic case, there is a basic differ- 
ence (inplicit in the definitions) between Moore-type and Mealy-type machines. 
For the latter, the output depends on the input and the current state, and is 
intuitively associated with the transition; thus p,(A|A) = aly = ny = 1, since 
no output emerges when there is no input. By contrast, the output of a Moore- 
type machine depends on the next state and is intuitively associated with a state; 
thus p,(4|A) = 0 and there is a time difference of one stroke between the begin- 
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ning of the output sequences of the two types. Disregarding the empty input- 
output sequence, equivalence between the two types can be defined as follows: 


Definition 2.2: Two machines M and M' are state-equivalent if to every state 
s, of M there corresponds a state s; of M', and vice versa, such that p¥(v|u) = 
pi! (vlu) for every input-output pair (v, u) with /(v, u) > 1. 


Let M be an SSM of Moore-type M = (S, X, Y, {A(x)}, A). Define an SSM 
M' of Mealy-type as follows: M' = (S, X, Y, {A’(y|x)}) where S, X, and Y are 


as in M, but the entries of the matrices A'(y|x) = [a/,(y|x)] are defined by 
a(x), if y = A(sj) 
0, otherwise 


a boss | 


It is left as an exercise to show that the machines M and M' are state-equivalent. 

Let M be a Mealy-type SSM, M = (S, X, Y,{A(y|x)}). Define an SSM M' 
of the Moore type as follows: M' = (S', X, Y, {A’(x)}, A) where X and Y are 
as in M; S' is the cartesian product S x Y; the (|S| - | Y|-dimensional) matrices 
A'(x) are defined as 


Alyx) —-. A(yalx) --- Ad) ] 


A(y,|x) v0 Ad) 
ACyilx) e Ayal) J 
where y;,...,y, is the sequence of symbols in Y; finally, A is the function 


A(s,, y) = y for all i. It is left to the reader to show that the machines M and 
M' are state-equivalent. 

Inasmuch as every Moore-type SSM has a Mealy-type equivalent and vice 
versa, either type will be used at convenience for proving properties of machines 
in general. 

It is easy to see that the above definitions of Moore and Mealy types gener- 
alize the corresponding definitions of deterministic machines. On the other 
hand, since the stochastic machines are more elaborate in structure than deter- 
ministic machines, further generalized definitions are possible. Consider, for 
example, the following: 


Definition 2.3: An output-independent SSM is one such that the matrices A( y|x) 
can be written in the form A( y|x) = I(y|x) A(x) where the A(x) are stochastic, 
and I(y|x) are diagonal matrices with >), I(y|x) = I (= the |S|-dimensional 
unit matrix). 


The interpretation of this definition is as follows: Let a{(y|x) be the ith dia- 
gonal entry in I(y|x), and a(x) the (i, j)th entry in A(x); then a,,(y|x) is given 
by 
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alyx) = aya Go (12) 

If a y|x) is interpreted as the probability of the output being y if the input 
is x and the current state i, and a;;(x) as the probability of next state being s; 
if the current state is s; and the input x, then (12) means that the two random 
variables are mutually independent, that is, the next state of the machine is in- 
dependent of its output for a given current state and input. 

It is clear that the output-independent machines as defined in Definition 2.3 
provide another generalization of deterministic Mealy-type machines. On the 
other hand, the two generalizations are not equivalent. We will show now that 
although every output-independent machine is an SSM, the converse is not al- 
ways true. 

Lemma 2.1: If M is an output-independent machine, then for any degenerate 
initial distribution 5; the value P,(yv|xu)/Ps(y|x) does not depend on y, pro- 
vided P,(y|x) z 0. 


Proof: Let a(y|x) be the ith diagonal entry of J(y|x) then a(y|x) = 

5; I(y|x)y and 
&Ky|x) = aylx)s; = $JICy|x)n 5; 
But 
$, ACy |) = 5, Ky|x) A(x) = 5 Kyl) 
A(x) is stochastic, so that A(x)y = y. Combining these equalities, we have 
Ps(yo|xu) = 5, ACy|x) Awun 

5, I(y|x) A(x) Awun = s; I(y|x)n s, A(x) Avlu) 
5, A(y|x)y 5; A(x) A(olu)y = ps.(y|x) 5, A(x) AQ 


or 


P.yo|xu) _ = 
P Ox) = 8, A(x) A(v|u)y 


and the right-hand side does not depend on y. 
Example 2: Let M be the SSM with S = (5,, s2}, X = {a}, Y = {0, 1}, and 


40) =|" l aal) =|* i 


6 3 LZ 6 


Assume also that the initial distribution is 5, = (1 0). Then 


P,(0|a) = (1 0) lens 
"AT 


P, (00aa) = (10) f il H z% 
al 
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1 l 1 
pa( la) = (10) E ^i "EE 
3 6 
p,(10\aa) = (10) E mlb | | La 
+4 ie 11 
thus 
P,;,(00|aa) e 3/16 m 9 
P,(0a) 5/12 20 
and 


P,.(101qd _ 1/4 _ 3 

ROA 7/2 7 
The two values are not equal, hence the given machine is not output independent. 
Another Mealy-type SSM can be defined by requiring that the entries in the 
matrices I( y|x) be either 0 or 1, and another Moore-type SSM by assuming that 
the function A in Definition 2.2 is probabilistic [see Exercises 6 and 7 at the 


end of this section]. 


EXERCISES 


1. Find a Moore-type machine which is equivalent to the machine in Exam- 
ple 1. 

2. Given the Moore-type machine M = (S, X, Y, {A(x}, A) with S = {s,, s2}, 
X = (0, 1}, Y = (a, b}, 


_ i 4 - 1 0 
40) =|) 4 a=, i 


and A(s,) — a, A(s;) — b, find an equivalent Mealy-type machine. 


3. Prove that the interpretation of A(x) and (11) implies that a,,(u) is the prob- 
ability of the machine moving from state s, to s, when fed the word u. 


4. Prove that every Mealy-type machine has an equivalent Moore-type machine 
and vice versa, using the construction given in the text. 


5. For the machine given in problem above, compute the following values: 
a. ps,(abb|010) 
b. q(a|011, bb) 
c. r(a|1101) 

[For the definition of q and r, see Exercise 5 in Section 1.] 


6. Consider the following: 


Definition: Àn SSM is of the Mealy-type with probabilistic output if the mat- 
rices A(y|x) can be written in the form A(y|x) = A(x) I( y), A(x) being stochas- 
tic and J( y) diagonal matrices with $^,., (y) = I. 
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Show that, under proper interpretation, the output of such a machine is a 
[probabilistic] function of the next state and independent of the transition im- 
posed on the current state by the input. 


7. Show that Example 2 can be represented as a Mealy-type SSM with proba- 
listic output. 


3. Synthesis of Stochastic Machines 


In the two methods for synthesizing stochastic sequential machines presented 
below, the machines are assumed to be of the Moore-type. 


a. Method 1 


Method 1 is illustrated in Figure 1. Let M = (S, X, Y, (A(x)), A) be a ma- 
chine, Z an auxiliary alphabet with |S| = n symbols, and p(s, x) an indepen- 
dent information source emitting the symbol z; € Z with probability a;,(x). 


State 
box 


(delays) 
Output lines Y 


Input lines X 


Figure 1. Schematic representation of a network synthesizing an SSM. 


The source box emits all sources p(s, x), each of them through a separate line. 
The box marked “J Logic" is a combinatorial network whose output is that 
emitted by source p(s, x) if the feedback input is s; and the X input is x. The 
“state box" is a combination of delays (or flip-flops) representing the states of 
the machine. If the input to this box is z,, the delays are set so as to represent 
the state s,, the feedback being a signal representing the current state of the 
machine. Finally, the *0 logic" box is a combinatorial gate simulating the 
function A. 
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It is clear that the above diagram synthesizes the given machine. It follows 
from the construction that pr[next state s,|current state s; input x] = pr[/ logic 
output z,|current state s, input x] = pr[z,| the source emits p(s;, x)] = a,,(x) 
as required. 

The procedure above involves synthesis of combinatorial networks with or 
without feedback, and construction of information sources with prescribed 
probability distributions, for which the reader is referred to Harrison (1965), 
Hartmanis and Stearns (1966), or McCluskey (1965), and to Gill (1962b, 1963), 
Sheng (1965), Tsersvadze (1963), or Warfield (1965) respectively. It will now 
be shown that the procedure can be simplified by means of the following lemma. 


Lemma 3.1: Any m x nstochastic matrix A can be expressed in the form A = 
Dd p,U, where p; > 0, Y; p, = 1, and U; are degenerate stochastic matrices (with 
entries either zero or one), and the number of matrices U; in the expansion is 
at most m(n — 1) + 1. 


Proof: Let A = [a;j], U, = [u;;] is a degenerate stochastic matrix such that 


I if ais the first maximal element in the ith row of A 
a 0 otherwise 

Let p, be the value p, = min; max, a;;; then clearly A — p,U, is a matrix with 
nonnegative entries. Moreover, 4, = [1/(1 — p,)] [A — p, U,] is a stochastic 
matrix (for the sum of entries in any row of A — p,U, equals 1 — pj) with 
more zero entries than the original matrix A, and A = p,U, + (1 — p,)A,. The 
procedure is now repeated for A, as the new A, represented in the form A, = 
P U + (1 — p,)A, with A, again stochastic with less zeros than A,. In this 
manner at most m(n — 1) steps yield a matrix A, in the form of a degenerate 
stochastic matrix U,. The required expansion is thus found with at most 
m(n — 1) + 1 matrices U,. 


Example 3: Let A be the matrix 


PN 
ll 
we wi 
o a 
Mie a 


&- 
al 
Ne 


then 


hence, 
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04 4 0 1 0 
A=|0 0 i, U=|0 0 1, m=} 
440 100 


And the resulting resolution is 

A= 3U,+ HU + $U4] = 4U, + 4U, + 4U; 
Note that although the above example is a square matrix, this requirement is 
not essential and the procedure works for any stochastic matrix. 

We now apply Lemma 3.1 to the procedure. To this end, let Æ be the 
stochastic matrix whose rows are the probabilistic distribution vectors p(s; x), 
i.e., A has |S| x |X| rows and |S| columns, and can be expressed in the form 
A = $, p,U; according to the lemma. Let W = {w,,..., w} be an auxiliary 
alphabet with t symbols, one for each matrix U; in the expansion of A, and let 
p be a single information source over W emitting the symbol w; with probability 


Pi 


Combinatorial 


Delays 
(states) 


ts X 
inpute logic 


The combinatorial logic is constructed so that its output is z; for input 
(xi, Wm Sx) if and only if the entry of matrix U,, in the row corresponding to 
(54, xj) and in the column corresponding to s; equals one (notation: uf»; = 1); 
the state box and the O-logic are as in Figure 1. We have that pr(next state s;| 
current state s,, input xj) = pr(W-input is w,, with u; n; = 1) = Y, p, where 
the summation is over all m with uf»; = 1. This sum, however, equals the 
corresponding entry in A which is p/(s,, xj) as required. 

Example 4: Let M = (S, X, Y,{A@)}, A) be an SSM with S = (0,1) = 
X = Y, A(0] = 1, AC) = 0, and 

i 
2 


ao) - 4 AQ) = 


Source p, W 


Figure 2. Simplified network for an SSM. 


n= cm 
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then A = [p/(s,, xj)],,. e s;x e x OF; 


1 1 

2 2 

1 3 

Ault f 

|e d 

$ à 

Applying the resolution of Lemma 3.1, we get 
1 0 0 1 0 1 0 1 
0 I1 1 0 0 1 0 1 
A-—A 1 41 eal 

fro 10 a) ora lo 3 
1 0 0 1 0 1 0 1 


thus W = (wi, Wz, w,, w,] and p = (4, 4, 4, 45). Encoding the symbols in W as 
00, 01, 10, 11 respectively we get the transition table, Table I. Now using the 


Table I Transition Table for the machine in Example 4. 


w x (current) 5 (next) Output 
00 0 0 0 1 
00 0 1 1 0 
00 1 0 0 1 
00 1 1 0 0 
01 0 0 1 1 
01 0 1 0 0 
01 1 0 1 1 
01 I 1 1 0 
10 0 0 1 1 
10 0 1 1 0 
10 1 0 0 1 
10 1 1 1 0 
11 0 0 1 1 
11 0 1 1 0 
11 1 0 1 1 
11 1 1 1 0 


Karanaugh map method or other methods we obtain a network which synthe- 
sizes the given SSM, as shown in Figure 3. 


b. Method 2 
Given the machine M = (S, X, Y, {A(x)}, A), expand all matrices A(x) in the 


form A(x) = 57; p*U;* using Lemma 3.1. Assuming that the above expansions 
all have the same matrix U in the ith place for all i [i.e., the values p,*, but not 
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w piw) 
n oo 1/2 
O1! 1/4 
10 1/6 
WwW 
41 1/12 
S=Z 
Y 
E* — De m ES 
"And" gate "Or" gate Inverter Unit delay 


Figure 3. Realization of transition Table I. 


the matrices U;*, depend on x], the restriction on p;* is weakened to p;* > 0. 
[This is possible because there are only a finite number of different matrices of 
the form U,” and some zero-valued p; may be added if necessary to meet the 
requirements] Let Z be an auxiliary alphabet with q symbols, where q = 
max, [there exists x € X such that p,* ~ O in the expansion of A(x)] < (n — 1)". 

We define the deterministic Moore-type sequential machine ⁄ as follows: 
M = (S, Z, Y, ô, A), where S, Y, and A are the same as in M, Z is the auxil- 
iary alphabet as specified above, and 6 is the function defined by 


O(S;, Zk) = s; if uf =1 (13) 


where U, = [uk] [by construction, U,* = U, does not depend on x.] Finally, 
let p(x) be an independent information source over Z such that the probability 
of z, being emitted by p(x) is př. Consider now Figure 4. The source box here 
emits all sources p(x), each of them through a separate line. The J-logic is a 
combinatorial gate whose output is that emitted by source p(x) if the X input 
is x. 

It is easily seen that the above diagram is a realization of M (the states of M 
being identified with those of .#), for if the current state of ./ is s, then its 
next state is s; only if the input is z, and 6(s,, z,) = s; or uf, = 1 [see Eq. (13)]. 
But the probability of the input being z, is p,*, depending on the input symbol 
x of M. Therefore, pr(next state of M s;|current state of M s) = >) p^ u = 
a; (x) by the construction of the matrices U,. 
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Source 


Machine Output lines Y 


M 


box 


Input lines X 


Figure 4. Schematic representation of a network synthesizing an SSM according to 
second procedure. 


As in the preceding procedure, the above construction can be further sim- 
plified by using Lemma 3.1 again and resolving, accordingly, the stochastic 


matrix A whose rows are the distributions p(x). The resulting diagram will be 
as in Figure 5. 


Inputs X 


Deterministic 


Combinatorial Y (outputs) 


machine 


Mm 


. network 
Source p(W inputs) 


Figure 5. Simplification of network in Figure 4. 


Since the simplification follows the same course as in the preceding case, the 
details are left to the reader. 


Example 5: Let M be the same SSM as in Example 4. The second procedure 
will be used. 


1 0) 0 I 1 0 0 1 
A(0) = 3 +4 +0 Li 
i P 1 | j|, a lı o} 10 1 
1 0j 0 1 10]. sor 
A0) 4 +0 
() + l| l TE 0 TEN 1 
Thus p(0) = (4, 4, 0, 4) and p(1) = (3,0, 4,3). Let A = [AP], then 
1000 0100 
A=4 p 
H 0 1 | ^ *lo 0 0 1] 
PP E 
*l1 0 00! "700 0 1 
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Let Z = {z,, Z» 23, zi]; W = (wy, Wz, W3, Wa} and assigning z, > 00 — w, z, > 
01 — wz Z; > 10 — w; Z, > 11 — w, 


Table II Transition table for the machine in Example 5. 


5 s 
w x z Z (current) (next) y 
00 0 00 00 0 0 1 
00 1 10 00 1 1 0 
01 0 01 01 0 1 1 
01 1 11 01 1 0 0 
10 0 11 10 0 0 1 
10 1 00 10 1 0 0 
11 0 11 11 0 1 1 
11 1 11 11 1 1 0 
Combinatorial network Machine M 


The combinatorial network and the machine .// are given in the transition 
tables, Table II. The synthesis of the machine M is given in the network in 
Figure 6. 


W p(W) 
oo 1/2 
O1 1/4 
10 1/6 
14 1712 


Combinatorial Machine M 
network 


Figure 6. Realization of Transition Table II. 


c. Comparison of Methods 


The methods given above are obviously not exhaustive. Another alternative 
with the SSM in its Mealy-type form is as follows: The matrices A(y|x) are 
arranged in the form of a single matrix 
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Alyx) +++ Aux) 


A =F . 
ACyilxs) "get A(yi1Xm) 
where Y = y,,..., y, and X = xy...,x,. A is stochastic and can be resolved 
according to Lemma 3.1, after which the process is continued along the same 
lines as in the original procedure (details are left to the reader). 

In the deterministic case the most common measure of complexity of a ma- 
chine is the number of its states; it is evident, however, from the above consid- 
erations that other factors (such as the number of gates in the resulting network 
or its type), should also be taken into consideration. 

For example, the degree of simplicity of the network is governed not only by 
the realization method used, but also by the assignments prescribed for the state 
variables and inputs (both original and auxiliary). Still another likely factor is 
the number of symbols in the auxiliary alphabet W appearing in all the above 
methods as a random source with prescribed probabilities for each symbol. It 
is easily seen that from this viewpoint the first method is preferable, since by it. 
Lemma 3.1 is applied to a matrix A with |X| x |S] rows and |S] columns, so 
that |W] < (|S| — 1)(|X| x |S]) + 1, whereas by the second method the lemma 
is first applied to the matrices A(x); since there are at most ([S| — 1)!5! deter- 
ministic stochastic matrices, the auxiliary alphabet Z has at most that many 
symbols, and the resulting matrix A has |X| rows and at most (|S$| — 1)!5! col- 
umns. Resolution of the resulting matrix A yields 


[Wi x (S| — 05! — DIXI +1 
a much higher bound than in the first case, which proves our claim. 


EXERCISES 


1. Given the SSM, M = (S, X, Y, (4(x)), A) with S = (s, $» 53}, X = {0, 1}, 
Y = fa, b}, 


$ $0 à 0 íi 
A0) —|0 i $$, 4Q)=|4 2 0 
1 0 4 i1 1 5 
2. 7 4 8 8 


and A(s,) = A(s;) = b, A(s;) = a, give a synthesis of M using the first method. 
2. As above, using the second method. 


3. Prove that if {4} is a set of stochastic matrices and (p,) is a probabilistic 
vector of dimension equal to the number of matrices in the set, then Y; p; A; is 
a stochastic matrix. 


4. Let M, and M, be machines over the same input alphabet X = {x,, x,} and 
output alphabet Y = {0, 1} respectively. Let the transition matrices be 
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0 i1 2 000 
0 4d i01 
a i oH 0 0 0 
M(0|x) — | dg 2s 12$, Mio) =| ste $$$ riv 
lds 0 £& 0 2 0 
[0 4 1 1 (0000 
0 4 O0 +004 
M,(0|x,) = g SI Max) =| * z 
0 d 01 440 
LO wy 4 0 lb 40 4 
FO: 2: eue ub fo 0 0 O 
0 1 0 d 1 0 4 1 
MO\x,)=| 7 . 2b Müx- i... 
p 6 0 4 0 3 6 
04 0 ¢ lt Of d 


Transform M, and M, into Moore-type machines and find the random distribu- 
tion over W according to the second method. Show that although |S^^| < 
ISM [W^e| > |W, 

5. Prove that if some of the input lines of a deterministic sequential machine 
are induced by a random independent source, the resulting machine is an SSM. 


6. Work out in detail the construction of the network in Figure 6 according 
to the second method. 


7. As above using the method described in Subsection 3,d. 


4. Bibliographic Notes 


Subsections 1 and 2 of Section A are based on the work of Carlyle (1961) 
with additions and examples suggested by Rabin (1963), Zadeh (1963b), Starke 
(1965) and Salomaa (1968). Subsection 3 is based in part on the work of Nieh 
and Carlyle (1968), Cleave (1962) and Davis (1961). Some additions in this 
section are new and the synthesis procedure suggested in Subsection 3d is due 
to Carlyle (private communication). Further reference: Booth (1964, 1965, 
1967), Gill (1962-b, 1963), Harrison (1965) Hartmanis and Stearns (1966), 
McCluskey (1965), Sheng (1965), Sklansky and Kaplan (1963), Tsertsvadze 
(1963), and Warfield (1965). 
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B. STATE THEORY AND EQUIVALENCE 


1. Set K" and Matrix H^ 


From this section on the machines to be considered are of the Mealy type 
unless otherwise specified. 


Definition 1.1: Given a machine M, K^ denotes the ordered infinite set 

K” = (g"(4A) --- n” (ylx) +++ molo) - --) 
such that all vectors of the form 4™(v|u) for all pairs (v, u) are in the set and 
the order is induced by some fixed lexicographic order on the pairs (v, u). 
K" (m) denotes the ordered subset of K^ such that 47™(v|u) € K™(m) implies 
that /(v, u) < m and the order in K™(m) is the same as in K”. [K^] denotes 
the [infinite] matrix whose ith column is the ith element of K^. 

Let (m) be the linear space spanned by vectors in K"(m) (F denotes the 
space spanned by all vectors in K™.) Then rank A(i) < rank J(j) if i <j, 
and rank S(m) < n = |S| for m = 0,1,.... Furthermore, it is readily seen 
that if (i) = S(i + 1) for some i, then A(i) = A(i + j) for j = 1,2,.... 
To prove this assertion, we observe that 

née (i 4-2)»-—Xjamv,u) and Kv,,u,)<i+2 

1 = Xa Addo) and Ko ug) <i +1 


>= »» a, ACys|xy) »» b, n(v,|us;) 
and 
Ko, ui (for S) S + 1) 
so that 
1 = »» i bjar A Yil XNK lk) = »» x b,a, (Up |ui) 
and (v7, Wy) <i + 1. Thus, y € ACG + 1) = (i) and the assertion follows. 
The above considerations show that there exists an integer m such that 
1 = rank Y(0) < rank 4(1) < --- < rank S (m) = rank S(m + 1) 
= rank A(m + 2) = --. = rank F < n 


also implying that m < n — 1. 
It is thus possible to find a set of linearly independent vectors in KM(n — 1) 
such that any vector in K^ is a linear combination of these vectors. 


Definition 1.2: Let 7,,...,7,, bea set of vectors having the following properties: 
1. 4, is the vector y(A|A). 
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2. Mı ... , Nm are the first vectors in K™(in order of the vectors in it) which are 
linearly independent and span the whole set. 
The matrix H^ is defined as 


H^ = [f t, ~~~. Nml = [A] i-l...,n j—lk...,mzn 
Thus, H^ is such that h, = 1, fori = 1,2,..., n; 0 < hy <1 for all i and 
j; the vectors 4, are elements of K™ and linearly independent, and any vector 
of the form y(v|u) is a linear combination of them; finally, the rank of H^ is 
m « n. 

In the sequel, when referring to the rank of a machine M, we refer to that 
of its H^ matrix. 


Example 6: If the matrices of a single-input two-output machine M are 


1.9 4 0 4 0 
A*(y)|0 0 O0, — 4V(y) 210 1 0 
$056 0% 0 


then its H^ matrix is 


13 
H^ —|1 0 
1 4 


Straightforward computation shows that by multiplying any of the matrices 
A(y,) or A(y,) by any of the column vectors of H™ [which are a subset of 
K™(1)], we have a new vector linearly dependent on the columns of H™. It 
follows that (1) = (2) (= F), which proves that the given matrix H™ 
has all the required properties. 


EXERCISES 


1. Construct a step-by-step algorithm for finding a matrix H™ for a given 
machine M. 


2. Find an H^ matrix for the machine whose matrices are 


$ 0$0 [$00 4 
2 0 4 0 1 0 0 i 
Anl) =|? . 3 p AG =| 5 x 
$050 $00 4 
$050 i$ 0 0 3 
0 4 0 3 ro 0 0 0 
0000 4 4 4 0 
A = i A(y,|x2) = 
(yix) 0401 (yilx3) l e a d 
0303 ity tz ty 0 
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3. Given a matrix H^ = [A,,] such that h,; = 1 for all i, O < Aj, < 1, and all 
its columns are linearly independent vectors, show that a machine M can be 
constructed effectively such that the given matrix H is its H™ matrix. 


4. Find a machine M whose H™ matrix is 


I ż ł 

Hu — 1 1 1 

100 

13$ 
5. Let M be a machine and z an initial distribution for M. Define the ordered 
set of row vectors G^? — (#(A, 4), ..., 2(y, x),..., (v, u),...), such that 


all vectors of the form 7(v, u) for all pairs (v, u) are in the set [if for some pair 
(v, u) the vector z(v, u) is not defined, then set z(v, u) = 7(A, A) for this input- 
output pair], and the order is induced by fixed lexicographic order on the pairs 
(v, u). Show that a matrix G^? = [gM] can be found effectively such that its 
first row is Z(A, A), 0 < g;; < 1 for all i and j, all its rows are linearly indepen- 
dent vector elements of G(^^?, and any vector of the form z(v, u) is a linear 
combination of the rows of G^. 


6. Construct a step-by-step algorithm for finding a matrix G^? for a given 
machine and a given initial distribution z. 


7. Find the matrix G^^? for the machine whose matrices are as in Exercise 2, 
with distribution z = (4, 4, 4, 0). 


2. Equivalence and Minimization of States 


Definition 2.1: Let z and p be two initial distributions for a given machine. z 
and p are called k-equivalent distributions if the functions p,(v|w) and p,(v|u) 
[see (7)] have the same values for all pairs (v, u) such that /(v, u) < k. z and p 
are called equivalent distributions if the functions p,(v|u) and p,(v|u) have the 
same values for all pairs (v, u). We are now able to prove the following theorem: 


Theorem 2.1: Two distributions z and p for a given machine are equivalent if 
and only if they are (n — 1)-equivalent, where n is the number of states of the 
machine. 


Proof: The “only if" part of the theorem is trivial. Assume now that the 
condition of the theorem holds, i.e., p,(v|u) = p,(v|u) for all pairs (v, u) with 
Kv, u) x; n — 1. This implies that zz(v|u) = pr(v|u) for all pairs (v, u) with 
l(v, u) < n — 1, so that rH” = pH". [The columns of H™ are, by construc- 
tion, of the form y(v|u) with I(v, u) < n — 1.] Let (v, u) be any input-output 
pair, then z(v|u) = >) a; where the nrs are the columns of H^. It follows 
that 


22 Chapter I. Stochastic Sequential Machines 


polu) = an(olu) = x $ an: = X ann: 
= E apr: = p Lan, = prlu) = pvlu) aM 


Corollary 2.2: Two initial vectors z and p for a given machine M are equiva- 
lent if and only if 7H” = pH™. 


Remark: An interesting geometrical interpretation of the above theorem and 
corollary derives from the following considerations. 
Let 


1 0 
H*c 
13 


for some machine M, and consider Figure 7. The set of all possible distribution 


Figure 7. Geometrical interpretation of distribution equivalence. 


vectors for M is represented by the simplex P, Any point X on the simplex 
satisfies the equation X7, = 1, x, > 0. Any point x also satisfying the equation 
Xm, = c for some real number c, must lie on the intersection of the simplex 
with the plane žy, = c. The equivalence classes of initial distributions are 
therefore represented by parallel segments in the simplex, and their number is 
infinite. 

Let M be a machine and let £( y|x) be the ith row (assumed to be a nonzero 
row) in the matrix A(y|x). Let £' be a substochastic vector with the property 
£(y|x) H4 = €'H™, and let M’ be a machine derived from M by replacing the 
row €,(y|x) of A(y|x) with the row ¢’. We have the following: 
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Theorem 2.3: The machines M and M' as above are state-equivalent. 
Proof: It suffices to prove that the equality 

nolu) = n" (olu) (14) 
holds for any pair (v, u). If the pair of symbols ( y, x) does not appear in (v, u), 
then (14) holds trivially, for nothing is changed in the matrix M'(v|u). Assume 
now that the pair of symbols (y, x) apears only once in (v, u), ie, (v, u) = 
(v, yv; u, xu;) where y does not appear in v, or v, and x does not appear in u, 
or u,. Then 

aq (vlu) = n" (v, yolu, xu) = AM(vi]o) AVCy|x)m" (vlu) 
By the definition of M', we have that A”(y|x)H” = AV'(y|x) H^, so that 
AM( yix)” (vu) = AV(y|x) n (vlu) as t (vlu) is a linear combination of 
the column of H^. It follows that 
n” (vlu) = A” (vilu) AV Cy|x)t" (vlu) 
= AM (viju) AV (yl Q;]u;) = 1" (vlu) 

The theorem follows since the above argument is readily extended by induc- 
tion to the general case. | 


Theorem 2.4: Let M be an n-state machine such that two rows of H^ are 
identical. Then an (n — 1)-state machine M* can be effectively constructed 
such that M and M* are state-equivalent (see Definition A.2.2). 


Proof: Let € bea row in a matrix A(y|x) of M. Assume that the rows j and 
k of H^ are identical, then the coefficients of č, and ¢, in the summation 
DY iha q = 1, 2,..., mare also identical. Replace the vector ¢ in A(y|x) with 
a new vector €’, such that č; = 0, č; = £, + €,, and €/ = ¢,, otherwise. Then, 
2j iha = eh, Sass Se (6, us [497 Toce Ohi, 
+ SOT + Gn Img = Xe, 
with q = 1,2,...,m or €H = €'H. The resulting machine M’ is therefore 
state-equivalent to the original machine M by the previous theorem, but the 
kth columns in all matrices of M' are zero columns. Let M* be the system 
derived from M' by deleting all kth rows and kth columns in the matrices of 
M'. M* is clearly an (n — 1)-state machine, for deletion of the zero columns 
of the matrices of M’ does not affect the relations $,,., X; a (y|x) = 1; 
moreover, M* is state-equivalent to M' by the correspondence 
si — gM e si and sM—sM fork Kix] 
This follows from the fact that 
py" (v]u) = 59 (v|u) = San lu) 


[the jth and kth entries in any column of H™ are identical, hence this holds 
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also for any vector of the form 5^ (v|u)], and piZ'(v|u) = pif (v|u) by construc- 
tion. The theorem follows by the transitivity of state equivalence. a 


Definition 2.2: A machine M is in reduced form if no two rows of H™ are 
identical [i.e., no two of its states are equivalent]. The following corollary is a 
direct consequence of Definition 2.2 and Theorem 2.4. 


Corollary 2.5: Every machine M has a reduced-form state-equivalent machine. 


Definition 2.3: An initiated stochastic sequential machine (ISSM) is an SSM 
combined with a fixed initial distribution. 

Definition 2.4: Two ISSMs (M, 2) and (M*, z*) are k-equivalent if p,@(v|u) = 
p." (vlu) for all pairs (v, u) with (v, u) < k. They are equivalent if the above 
equality holds for all pairs (v, u). 


Definition 2.5: A state s; of an ISSM (M, 2), is accessible if there exists an in- 
put-output pair (v, u) [the pair (A, A) included] such that 2,(v|u) + 0. 


Definition 2.6: An ISSM is connected if all its states are accessible. 


Theorem 2.6: If s, is an accessible state of an ISSM (M, z), then there exists an 
input-output pair (v, u) with I(v, u) < |S| — 1 such that z(v|u) = 0. 


Proof: If s; is accessible by an input-output pair (v, u) such that /(v, u) = m, 
then there exists a sequence of states of length m + 1, S1, 55, . .., $44, Such that 
s, corresponds to a nonzero entry in Z, $,,, = Sp and there is a positive proba- 
bility of transition by the corresponding input-output pair from one state in the 
sequence to the next. If m > |S| — 1, then the graph connecting that sequence 
of states contains a loop which can be reduced to yield a shorter input-output 
pair (v’, u’) by which s, is accessible. Proceeding in this way, an input-output 
pair (v, u) with Kv, u) < |S| — 1 can be found by which s, is accessible. 

Remark: Yt follows from the above theorem that the set of accessible states 
of a given ISSM (M, z) is the set of states corresponding to nonzero entries in 
all vectors z(v|u) where /(v, u) < |S] — 1. A practical method for determining 
the accessible states of (M, z) is thus available. 


Theorem 2.7: Every ISSM has an equivalent connected ISSM. 


Proof: We first observe that if s, is not an accessible state, then the jth entry 
in z is necessarily zero, so that the vector z' derived from z by deleting that 
entry is a stochastic vector. We note next that if s, is not accessible and, for 
some pair (y, x), a; y|x) > 0, then s, is not accessible either. Given an ISSM, 
(M, n), let (M', 2’) be the initiated machine such that: 


a. 7’ is derived from z by deleting all entries corresponding to nonaccessible 


states. 
b. The matrices of (M', n’) are derived from the matrices of (M, 2) by delet- 
ing all rows and columns corresponding to nonaccessible states. It is clear that 
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(M', 1") is the required ISSM because, by the previous remarks, if a deleted 
column has nonzero entries, then all rows corresponding to these entries are also 
deleted, so that the resulting matrices have the property that Y5,., M'(y|x) 
is a stochastic matrix, as required. i 


Definition 2.7: Let A and B be two square matrices of order r and s respectively. 


The matrix 
g A 0 
ALB= 
i |; al 


of order r + s is called their direct sum, and has the following properties: 


a. If A and B are stochastic matrices, then so is A + B. 
b. (A, + B(A; + Bj) = A,A; + B,B, (provided the pairs A, and 4, B, 
and B,, are each of the same order). 


These properties are readily verified. 
Definition 2.8: Let M = (S, X, Y, {A(y|x)}) and M' = (S', X', Y’, (A'(y|x))) be 
two SSMs. The machine M+ M' = (SU S', X, Y, (A(y|x) + A'(y|x)). is 
called their direct sum. 
Theorem 2.7: Two ISSMs (M, z) and (M', z') are equivalent if and only if 
they are (|S| + |S'| — 1)-equivalent. 

Proof: The “only if" part of the theorem is trivial. Assume now that the 
condition of the theorem holds. Let M* be the direct sum M+ M' and let p 
and p' be the (|S| + |S'|)-dimensional vectors 


po ,...,72,55,0,...,0), pP —(06...,02/...,2,) 
where 
T = (Tis... s Tisi) and X = (Ry,...5 Ts). 


Then it is readily seen that p,"(v|u) = P,V'(v|u) and p,™® (vlu) = P,” (oļu). 
Therefore, assuming that (M, z) and (M', n’) are (|S| + |S'| — 1)-equivalent, 
we have that p,.M'(v|u) = p," (v|u) = p,” (vlu) = p,” (vlu) for all pairs (v, u) 
with /(v, u) < |S] + |S'| — 1. Thus p and p’ are (|S| + [S'| — 1)-equivalent 
distributions for M*. The theorem now follows, using Theorem 2.1, and bear- 
ing in mind that M* has |S| + |S"| states. I 

Notation: For a given machine M, ¥™ denotes the set of all functions 
£M —ÍpM:z e Ph 
Definition 2.9: Two machine M and M' are equivalent if F~ = F. In other 
words, for every distribution z there is a distribution z' and vice versa such 
that (M, zx) and (M’, x") are equivalent ISSMs. 


26 Chapter I. Stochastic Sequential Machines 


Remarks 


1. It is readily seen that .Z ^ is closed under convex combinations. To show 
this, we observe that if p = (pj) € P, then 


»» p.p. v|u) = 2 pn n(v|u)) = x Pi x zj n(v|u) 
= py »» p, nu) = x py nkolu) = plu) 


where p; = >) p:n‘ and therefore p' = (pj) € 2,. 


2. By p (vlu) = Y z;ps(v|u), we have that the set F™ is the convex closure 
of the set of functions (pi, ps, ..., ps] = F s”. [A function of the form py, will 
be called an extremal function.] 


3. In terms of the sets A™, state equivalence of two machines M and M' 
signifies that F M = ¥,™’, hence [by the previous remarks] state equivalence 
implies equivalence. The converse, however, is not true, for the elements of 
F ™ [or of .7 ,."'] need not be convexly independent. [A set is convexly in- 
dependent if no element of the set is a convex combination of the other 
elements]. 


4. The following two conditions, are equivalent for two machines M and M': 
(a) M" =F" 
(b) F c F and Fy c FM 
The proof is left to the reader. 
Theorem 2.8: Let M be an n-state machine such that some row of H™ is a con- 
vex combination of the other rows. Then there exists an (n — 1)-state machine 
M' equivalent to M. 


Proof: Let h,,...,h, be the rows of H™, and assume that h; = $,,,a;h; 
(aj) € Z, and a; = 0. Thus, conv(A, . . ., ,) = conv(/,. .., his hiris © + s An). 
Let č be any nonzero row vector in any matrix of M, then £/Y; €; is a vector 
in Z,, so that Y,, (£,/35 £)h; € conv(h, . . . , ha) = conv(f, .. hi s Mists s+ +s 
h,). It follows that there exists a vector p € P, and p, = 0 with $5, (6/3, 6); 
= Dix; pih; Thus, €H” = (Ñ &)p H”, and replacing the vectors ¢ in the 
matrices of M with the corresponding vectors (>> €,)p, we have a state-equiva- 
lent machine M (see Theorem 2.3) such that the ith columns in all its matrices 
are zero columns. M and M' are therefore equivalent machines (see Remark 3 
above). Let (M', n) be an ISSM derived from M'. By the same argument as 
above, we find that there exists a vector z' € P, with z; = O such that (M’, 7) 
is equivalent to (M’, z’). Now the state s; for (M', n’) is not accessible, hence 
there exists an equivalent ISSM, (M*, z*) with (n — 1) states only, by Theorem 
2.7. The theorem follows by the transitivity of equivalence. 


Definition 2.10: A machine M is in minimal-state form, if the set of row vec- 
tors in H™ is convexly independent. 
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Corollary 2.9: Every machine M has an equivalent minimal-state from machine 
M'. 

Example: Let M be a machine with one input and two output symbols, de- 
fined by the matrices. 


1001 DT 
0000 1000 
A = , ACy |x) = 

(yilx) 40034 (y2|x) 0000 
1004 530259] 

] 3 

ecd 

] 1 

1 3 


The first and last rows of H™ are identical, hence the machine can be reduced 
to the state-equivalent machine M' [which is in reduced form] as described in 
the text, with 


i00 4 00 
A(y,|x) =|0 0 i A(yi|x) 2]1 0 0 
100 000 

l-$ 

H* =|1 0 

i.d 


The first row of H' is a convex combination of the other two, so that M' is 
state-equivalent to M”, with 


0 1 0 


aj 
a 


1 
4 
A"(y|lx)-|0 0 O0,  A"(ysx)-[0 à 4), HY’ = H" 
0 + £ 00 0 


Now let z = (7, 7, 7) be any distribution for M", then 
n* = (02, + i2, n, + i73) 
has the property 7H”” = z* H™”, so that z and z* are equivalent vectors. But 


the first state is not accessible in (M"', z*), hence M" is equivalent to M* [a 
connected and minimal-state form machine] with 


. M*(y,)x) F | 
" x)= 
* J2 0 0| 


Remark: Given a reduced machine M, in order to find its minimal-state form 
equivalent machine M' one must be able to find the (unique) set of convexly 


1 
2 


M*(yilx) = h 
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independent [external] vectors among the row vectors of H^. This problem can 
be solved by linear programming methods [e.g., Vajda (1961)]. Clearly, a vec- 
tor A^ is a convex combination of the other row vectors of H if there exists 
a solution to the following linear programming problem: Find a vector x — 
(xy, ..., Xp) such that 5,4; x, = 1, x, = 0, x, > 0 for all j and xH” = A". 
Some of the extremal rows of H^ can, however, be found by simpler methods 
[see Exercises 7, 8, and 9 below]. 


EXERCISES 


1. Find the reduced form and minimal-state form machine equivalent to the 
one defined by the matrices 


4 4 io 00% i 
1 1 1 QO 0 0 1 4 
Axb)-|) 1 , gp =l a 
9 6 6 0 i i 
tz ds tz 0 00581 


for the above ma- 


~— 


Find a distribution 2, equivalent to the distribution (1 4 $ à 
chine and such that z = (7, 7t; 7, n4) and zt, = T, = 0. 

2. Construct an algorithm for finding the set of all nonaccessible states of a 
given ISSM. 


3. Prove the relations (a) and (b) after Definition 2.7. 
4. Prove the assertion in Remark 4. 


5. Let fi, ..., f, be functions. Prove that f, € conv (fi... fx) implies that 
f; € conv(fi,..- fii fris +++ fe), unless f, is an extremal function. 


6. Prove that the relation conv( Fs”) = conv(.Z 5’) for two given minimal- 
state form machine implies that F ,M = F "4, with the following consequences: 
a. All minimal-state form equivalent machines are state-equivalent and have 
the same number of states. 
b. If M and M' are equivalent machines and M is minimal-state form, then 


|S| < IS]. 

7. Prove that if a row in some H™ has an entry which is maximal or minimal 
in the corresponding column, then that row is extremal (i.e., is not a convex 
combination of other rows). 

8. Let d, be the value d; = >, (4,,")* for a given row A," ina matrix H^. 
Prove that the rows corresponding to the maximal d, values are extremal. 

9. Let h;™ be an extremal row in a matrix H™, and let d; be the value d;; = 
Ya (hix — hj) where h; is some other row of H™. Prove that the rows h, cor- 
responding to maximal d,, values are extremal. 
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10. Find a set of extremal column vectors in the matrix ((H™)" denotes the 
transpose of H™). 


I i mox 
(HY =|0 3 3 3 5 $ 
l d 14 i 3X i1 
2 4 3 d 4 3 


11*. Let L be a linear space over the real numbers and a an arbitrary fixed 
element of L. The set of elements [y: y = x + a, x € L] is called a translate 
of L or a flat. Prove that 

a. A set of points in n-dimensional space, which is closed under convex com- 
bination of its points, is a flat. 

b. Let Z, be the flat (hyperplane) Z, = (x = (2,,...,7,): 3,7; = 1}, and 
M an n-state SSM. Define an equivalence relation over Z,* induced by M which 
is right-invariant and such that ,~ is decomposed by this equivalence relation 
into a cartesian product of two flats, the elements of the first flat being the 
equivalence classes of the defined equivalence. 


3. Covering Relations 


Definition 3 1: Let M and M* be two SSMs. The machine M covers the ma- 
chine M*(M > M*)if.Z"254gwm,. 
Theorem 3.1: The following four conditions are equivalent: 
a. M >M. 
b. There exists a stochastic matrix B such that By™(v|u) = s" '(v|u) for all 
pairs (v, u) (i.e., B[K"] = [K™")). 
c. There exists a stochastic matrix B such that 
BA"(y|x)g"(v|u) = AV Cy|x)Bu"v|u) 
for all pairs (v, u) and all pairs (y, x). 
d. There exists a stochastic matrix B such that 
BA™(y|x)H™” = A"'(y|x)BH^* 
Proof: (a) €&» (b): Assume that (a) holds. Then 
Let 5" (v|u) be a vector in K"', then 
ps, (vlu) pit(v|u) 
gqU(-—| : -— ^: 
py; (vlu) pie»(v|u) 
ny (vu) g» 
=| © f=] : [m*hà = Brou) 


n” yM (olu) nme) 
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where pM, is the function in .Z ^ equal to p, M' in F™* and B the matrix 
whose rows are the vectors 2. Thus (a) implies (b). Assume now that (b) holds 
and let pA' be any function in .Z V' Let z be the distribution z = z* B; then, 
for all pairs (v, u), we have 


pr (oju) = zu" (vlu) = z* Bu" (vlu) = x* qu (olu) = p; (lu) 


Thus ^ > ¥™ and (b) implies (a). 
(b) &» (c): Assume that (b) holds, and consider Figure 8. It follows directly 


M( 
9" (vlu) yl 9" (y vix u) 
|: | ] 

+ 
n™ (v|u) M” (y|x) ne (y vix u) 


Figure 8. Mapping B from M to M*. 


from this diagram that (b) implies (c). We now prove by induction on the 
length of pairs (v, u), that (c) implies (b). For /(v, u) = 0 the implication is 
trivial, as both 4™(A, A) and 4™°(A, A) have all their entries equal to 1, and 
therefore for any stochastic matrix B of suitable dimention By™(A, 4) = 
n(A, 4). Assuming that the equality By™”(v|u) = 5j "'(v|u) holds for some 
pair (v, u) with (v, u) = k and (c), we have that 


Bu" (yv|xu) = BAC/|x)n"(o]u) = A""Cy|x)Bn" Qu) 
= AM (yix olu) = n yv|xu) 


as necessary. The implication is thus proved. 

(c) & (d): That (c) implies (d) is trivial, as the columns of H^ are vectors 
of the form 4™(v|u). The converse is also obvious as any vector of the form 
1" (v|u) is a linear combination of the columns of H^. H 


Definition 3.2: Given two machines M > M*, J™* is the matrix whose columns 
are vectors in K™* which are related to the same input-output pairs as the col- 
umns in H™. 


Theorem 3.2: Let M > M' be two machines such that rank M = rank M', then 
there exists a stochastic matrix B such that BH” = H™*. Furthermore, if 
9" (v|u) = X, a 9, "(the rs being the columns of H^) is a vector in K” and 
9"^'(v|u) = EZ, bin? the corresponding vector in K^, then a; = b; for i = 
| M: 

Proof: Let B be the matrix in condition (b) of Theorem 3.1. Then J^' = 
BH". Denote the columns of H^ by 7,,..., Nm and the corresponding columns 
of J™ by m,*,..., Nm*. Finally, let 7" (v|u) be any vector in K". Then 
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(ola) = $ an: (15) 
this implies that 


n (vlu) = By" (v|u) = > a, Bn, = Y, ait* (16) 


Thus any vector in K' is a linear combination of the columns of J™* and 
therefore, since M and M* have a common rank, rank J“* = rank H™* = rank 
H^", Furthermore, the columns in H™* must be columns in J™’ (in the same 
order). If this is not true, then let z/^'(vo|u;) be the first column in H^' which 
is not a column in J^. The corresponding vector n™ (v,|u;) in K^ is, by defini- 
tion, not in H™, hence it is a linear combination of vectors in H™ preceding 
the vector n™ (volu) in K^. This would imply by (16) that z^" (volto), a column 
of H™*, is a linear combination of other columns of H™*, contrary to the pro- 
perties of H™*. Thus the columns of H™* are a subset of those of JM*. Now 
this subset cannot be proper, for rank H^ = number of columns in H™ = 
number of columns in J4' = rank J™* = rank H^' = number of columns in 
H™*. The second part of the theorem is an immediate consequence of the first 
part and of relations (15) and (16) above. I 


Theorem 3.3: Let M and M* be two equivalent SSMs with n and n* states re- 
spectively. Then rank M = rank M*, conv(h,”,...,4,”) = conv(h,™’,..., 
h,“™")(where h;™ and h,™” are the ith rows in H^ and H™" respectively), and 
there are stochastic matrices B and B* such that H' = BH™ and H™ = 
B*Hw, 

Proof: M = M* implies that M > M* and M* > M. By Theorem 3.1, there 
exist stochastic matrices B and B* such that 5" (v|u) = B5" (v|u) and g^ (v|u) = 
B* q""(v|u) for all pairs (v, u). This implies that rank M* > rank M > rank 
M*, or rank M* = rank M. By Theorem 3.2, H^' = BH" and H™ = B* H™’, 
signifying that every row of H™ is a convex combination of rows of H'' and 
vice versa, or, conv(h, M, ..., h,") = conv(h, V, ..., Aye"). H 


Theorem 3.4: Let M and M* be two state-equivalent machines with n and n* 
states respectively. Then [h,M. ..., , M] = (,M,..., Aye} 


Proof: The entries in the ith row of H™ are values of an extremal function 
pu^ € Fs™ for some input-output pairs (A, A), (Vi, uj), ..., (Um-1s 4,44). As 
M is state-equivalent to M*, it follows from Remark 3 on p. 26 that F s~ = 
F s™* so that there exists an extremal function p,” € Fs" equal to p,,. 
The entries in the ith row of H^ are therefore equal to those of the jth row of 
the matrix JM” whose columns are 15" (AJA), n (wilu), . .. , n (v, it i). 
Now, state-equivalence implies equivalence and therefore, by the previous 
theorem, J^' = H™" so that there exists a row in H^, the jth row, identical 
with the ith row in H*. The proof is completed by reversing the argument. P 
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Theorem 3.5: Let M and M* be two state-equivalent and reduced machines 
with n and n* states respectively, then n = n*; the rows of H™ are a permuta- 
tion of the rows of H™’; and, if A”(y|x) and AM'(y[x) are corresponding ma- 
trices of M and M* respectively, then AM(y|x) H^ = AV'(y|x) HV' up to a 
permutation of rows. 


Proof: It follows by definition that no two rows of H™ and no two rows of 
H'' are identical (the machines are reduced). By the previous theorem 
(A4, ..., hM] = {hy ..., 4}, Combining these facts, we have that n = 
n* and the ordered set of rows of H™ is a permutation of the ordered corre- 
sponding set of rows of H™*. If the states of M* are properly ordered, then the 
equality H^ = H'' holds and, as the machines are state-equivalent and the 
equivalence is one-to-one, we have that 4™(v|u) = n™ (v|u) for all pairs (v, u). 
By (5), AV'(y|x) H4 = A™M*(y|x)H™*. The theorem is thus proved. E 


Theorem 3.6: Let M and M* be two equivalent minimal state form machines 
with n and n* states respectively, then 

a. n — n*. 

b. M is state-equivalent to M*. 

c. The corresponding matrices A"(y|x) and A™”’(y|x) satisfy the relation 
A™(y|x)H™ = AV'(y|x)HM* up to a permutation of rows. 

d. There exist permutation matrices B and B* such that H^ = B* H"" and 
HV = BH". 

Proof: By Theorem 3.3, since M and M* are equivalent, we have that conv 
(hy™,..., A, M) = conv(h; ^,..., h, ^). By definition, points h,”,..., h,” 
are the vertices of the polyhedron conv{h,”,...,h,“} and points h,™’,..., 
h,.^' those of conv{h,™’,...,h,™"}. As the set of vertices of a polyhedron is 
uniquely determined by the polyhedron, we have that [À,",..., ^, "] = 
{h,™",..., hy}. M and M* being minimal-state form, they are also reduced- 
form, so that all points in either set on both sides of the above equality are 
distinct. Thus n = n* and M is state-equivalent to M*. Properties (c) and (d) 
now follow from the previous theorem. I 
Corollary 3.7: Let M and M* be two equivalent machines such that M isa 
minimal-state form machine. Then n < n*. 

Proof: The set {h,™,..., 5,4] is the unique set of vertices of the polyhedron 
conv(A,^, . .. , A, 4) = conv(h;, ^, ..., h,."^), and the number of vertices of a 
polyhedron is the smallest number of points such that their convex closure spans 
the whole polyhedron. I 

Remark: Compare the above theorem and corollary with Exercise 6 in the 
previous section. 

We now consider the uniqueness problem for reduced-form and minimal-state 
form machines. 
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Definition 3.3: If H™ is a matrix related to a machine M and € is a row sub- 
stochastic nonzero vector of suitable dimension, then h”(€) is the point 
(€/35 £)H^ in conv(h, ..., Am). The vector € is simplicial, if h (č) is a point 
on a face of conv(A, . .. , hm) which is a simplex. 


Definition 3.4: Two machines M and M' are isomorphic if they are equal up to 
a permutation of states. 


Theorem 3.8: Let M be a reduced-form machine. There exists a reduced form 
machine M* which is state-equivalent but not isomorphic to M if and only if 
there exists a row €(y|x) in a matrix AM(y|x) which is not simplicial. 


Proof: Assume first that all the rows in the matrices AM( y|x) are simplicial. 
Jf M* is reduced and state-equivalent to M, then by Theorem 3.5, AM( y|x) H^ = 
A™"(y|x)H™" for all pairs (y, x) up to a proper rearrangement of states. Since 
the rows of A™(y|x) are simplicial, this is possible only if AM( y|x) = A*'(y|x), 
for an interior point of a simplex has a unique representation as a combination 
of its vertices. Thus M is isomorphic to M*. Assume now that there is a row 
&( y|x) in a matrix AM( y|x) which is not simplicial. This means that h(¢(y|x)) = 
> a, h;, where the h; corresponding to nonzero coefficients &; are not a simplex. 
This implies, by a classical theorem on convex bodies (see Exercise 5 at the end 
of this section), that there exists a set of coefficients (fj) not identical to (a,), 
such that the combination $, f;A; is convex and equals >) a,h, Thus there 
exists a substochastic vector p not identical to &(y|x) and such that ¢(y|x)H™ = 
pH^. Let M* be a machine derived from M by replacing the vector &( y|x) in 
A(y|x) with the vector p. By Theorem 2.3, M and M* are state-equivalent, but 
M* is not isomorphic to M by construction. | | 

Assume now that two equivalent machines M and M* are in minimal-state 
form. Then they are also in reduced form and state-equivalent [Theorem 3.6]. 
This observation leads to the following corollary. 


Corollary 3.9: Let M be a minimal-state form machine. There exists a minimal 
state form machine M* which is equivalent but not isomorphic to M if and 
only if there exists a row ¢( y|x), in a matrix A( y|x) of M, which is not simplicial. 


It follows from the above theorem and corollary that the uniqueness of the 
reduced or minimal-state form of a machine M is conditional on the nature of 
the points h™(E(y|x)), where €(y|x) is a row in a matrix AV(y|x). To find the 
nature of these points, we must be able to extract from the set of points 
(A\M,...,4,”) (denoted by V throughout this subsection) all subsets W such 
that conv(W) is a face of (V). This done, we have to decide whether the faces 
conv( W ) are simplexes or not. A decision procedure for these questions is based 
on a theorem stated below. [The reader is referred to Grunbaum (1968) for 
proof of the first part of the theorem.] 

Let M and H^ be a machine and its corresponding H matrix, assumed to be 
of dimension n x m. With H we associate a new matrix H™ such that the 
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columns of H™ form a basis for the null-space of the space spanned by the 
columns of H™. Clearly H^ is an n x (n — m) matrix. Let V be the set of 


rows of H™ [considered as points in (n — m)-space]. Let N' = (i,,...,4,) be 
a subset of the set of integers N = (1,..., n). V(N’) denotes the set V(N') = 
(h,...,h,) and similarly V(N) = (&,,...,A;) where Y = (h,,...,4,). 


Finally, V — W stands for the set V(N — N’) where W = V(N?). 


Definition 3.5: The set of points conv(W) = conv(V(N’)) is a coface of conv 
(V) if and only if conv(V — W) is a face of conv(V). [We shall say, alterna- 
tively, that W is a coface of V.] 


Theorem 3.9: W = V(N’) is a coface of V if and only if either W = $ or 0 is 
in the relative interior of V(N'). [The whole polyhedron is considered as a face 
of itself.] A face V(N') = W of V is a simplex if and only if the set of its ver- 
tices is linearly independent. 


Remark: It is clear that the criteria used in this theorem are decidable and 
effectively checkable by straightforward linear programming methods. Note 
also that the second part of the theorem is a trivial consequence of the 
definitions. 


EXERCISES 


I. Let M be a reduced machine such that all entries in its matrices are either 
0 or 1 (i.e., M is deterministic). Prove that M is also in minimal-state form. 


2. Prove: If M is a reduced deterministic SSM, then no SSM M* such that 
M* > M has fewer states than M. 

3. Prove: Let M and M* be two state-equivalent machines such that the map- 
ping between the states of M and those of M* is one-to-one, then for every pair 
(y, x), AV (yp) H* = AV" (y|x) H*, and AV'(y|x) H^ = A"(y|x)H'" up toa 
permutation of rows. 

4. Prove: M* > M if and only if F“ 2 F™. 

5. The following is Radon's classical theorem on convex bodies: 

Theorem: Each set of n + 2 or more points in n-dimensional space can be sub- 
divided into two disjoint sets whose convex closures have a common point. 

On the basis of this theorem, prove that for any row ¢(y|x) in a matrix 
A( y|x) which is not simplicial, there exists a substochastic vector not identical 
to &(y|x), with A(&(y|x)) = A(p). 

6. Prove: Let M be a machine. Construction of a reduced form by merging 
equivalent states yields resultant machines which may be nonisomorphic only 
if there exists two rows £(y|x) = €(y|x) in a matrix A(y|x) which are not 
simplicial and such that A(£(y|x)) = A(&(y|x)) and the states s; and s; are 
equivalent. 

7. Prove: Let M be a machine. Construction of a minimal-state form equiva- 
lent machine yields resultant machines which may be nonisomorphic only if 
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there are two rows €,( y|x) + č;(y|x) in a matrix A( y|x) which are not simplicial 
and such that A(£(y|x)) = A(£(y|x)), the states s; and s, are equivalent and 
h({ = h)) is a vertex of conv(A, . . . , h,). 


Note: Is “h, is a vertex of conv(h,,...,H,)” a necessary condition? Explain. 
8. Let H^ be the matrix 


l ż 2 2 
130 3 
H”™=!1 0 0 1 
1 0 4 4 
1 0 0 1 
Find the faces of conv(h,,..., hs) for the above matrix, and also which faces 


are simplexes. 
9. Consider the following: 


Definition: A machine M is observer /state-calculable if there exists a function 
fFSxXxY-—Ssuchthata;(y|x) = 0 if s, + f(s, x, y). Accordingly, such 
a machine has at most one nonzero element in each row of its matrices. What 
corollaries derive from Theorem 3.8 and Corollary 3.9 when applied to it? 


10. Prove that the vertices of a polyhedron are uniquely determined by the 
polyhedron. 


11. Prove that any machine of rank 2 has an equivalent two-state, minimal- 
state form, machine. 


12. Prove that the covering relation is transitive. 


4. Decision Problems 


Theorem 4.1: Let M > M* be two machines, and let B be any stochastic matrix 
such that BH” = J*', Then By” (vlu) = n" (vlu) for all pairs (v, u). 

Proof: By Theorem 3.1 (M > M*) there exists a stochastic matrix B' such 
that B'g"(v|u) = n% (vlu), in particular B'H* = J™". Thus B’H™ = J^" = 
BH", so that the rows of B considered as distributions for M are equivalent to 
the corresponding rows of B’. It follows that By™(v|u) = B'g"(v|u) = ng" (v|u) 
for all pairs (v, u), and the theorem is proved. | 
Corollary 4.2: Let M and M* be machines. If for some stochastic matrix B, 
such that BH^ = J*', the condition BAM(y|x)H* = AV'(y|x)BH" does not 
hold for all pairs (y, x), then M = M. 


Proof: It is implicit in the proof of Theorem 3.1 that the matrix B satisfying 
the relation (b) satisfies also the relation (d). Using the previous theorem, we 
conclude that if M > M* and BH™ = J'', then B must satisfy the relation 
(b) in Theorem 3.1. The corollary is thus proved. | 
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Corollary 4.3: Let M and M* be machines. M > M* if and only if there exists 
a stochastic matrix B satisfying the conditions BH = J™’, and for any such 
B the condition (d) of Theorem 3.1 is satisfied. 

Corollary 4.4: Given two machines M and M*, it is decidable whether M > 
M*. 

Proof: There exist algorithms for finding the matrices H^ and J'. Using 
the preceding corollary, we see that if a stochastic matrix B such that BH” = 
J™ does not exist, then M } M, and this question can be answered with the 
aid of linear programming methods. If such a B does exist, it is again obtainable 
by linear programming methods. Finally, with B found, if and only if the rela- 
tion (d) in Theorem 3.1 holds for it, then M — M*. The corollary is thus 
proved. | 
Corollary 4.5: Given two machines M and M*, it is decidable whether M = 
M*. 

Proof: M = M* if and only if M > M* and M* > M. | 


EXERCISES 
1. Given two machines M and M*, formulate a decision procedure for finding 
whether M > M* based on Theorem 2.7. 


2. Given two machines M and M*, formulate a decision procedure for finding 
whether M = M*, based on the fact that M = M* implies that rank M = 
rank M*, on Theorem 3.2, and on Corollary 4.3. 


3. Let M and M* be the machines whose defining matrices are 


4 4 0 03 0 
AM(Q(0)=|25 S O,  — 4"(10)-—j|0O à 0 
0 0 0 0 1 0 
0 0 0 4 0 4 
AMOI) —|0 e * wie 0 +s 
0 3 t 1 t 
Te & de fX 
A"'(000 =| $ £t 1 sao- 0 1 
0 0 0 idi 
0 0 0 4 0 4 
A"'(01) —|i & 1 san- 0 1 
$ Te e à 0. i 
Check for M > M* and M* > M. 
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5. Minimization of States by Covering, Problem I 


In the preceding section, we have seen that any machine can be reduced to an 
equivalent reduced-form or minimal-state-form machine. We now consider 
further reduction of the number of states of a minimal-state-form machine. 
The following problems are considered: 


Problem I: Find an n*-state machine M* such that M* > M and n* « n. 

Problem II: Find an n*-state machine M* such that M > M* and n* < n. 

Problem III: Let (M, z) be an initiated machine; find an initiated machine 
(M*, n*) such that (M, n) = (M*, 2*) and n* < n. 

A solution to Problem I yields a machine capable of realizing more functions 
than the original and with fewer states, and a solution to Problem II a machine 
less general than the original and again with fewer states. [The need for con- 
sidering the latter problem is due to the fact that there are cases in which it 
alone has a solution.] In Problem III we seek a minimal-state realization of a 
particular function defined by a given machine. 

We have proved in Section 3 [Theorem 3.1] that M* — M holds for two 
machines M* and M, if and only if there exists a stochastic matrix B* such 
that 


B*g"'(v|u) = n” (vlu) forall pairs (v, u) (17) 
or, equivalently, such that 
B* AM'(y|x)H™ — A™(y|x)B* HV* (18) 


If M is given and an answer to Problem I sought for it without additional in- 
formation on M*, (17) or (18) are of little use, as the matrix H™* is not known. 
On the other hand, one may begin the search for a solution by assuming that 
rank M* = rank M. If this is the case, then using Theorem 3.2, we know that 
a matrix B* satisfying (17) must also satisfy B* H"' = H^. Since only H™ is 
given, one may begin with any H™* matrix such that 


conv(hM',..., hM) D (hM... h^) and m «n (19) 


[since B* is stochastic, (19) is necessary], and try to reconstruct M* according 
to (18). The following algorithm ensues: 


Step 1: Assume rank M = rank M*, and find any matrix H™* satisfying (19). 
Step 2: Find a matrix B* satisfying 
B* H™* = H^ (20) 


Step 3: Solve (18) for 4"'(y|x), subject to the condition that the matrices 
A™"(y|x) be nonnegative. 
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If all steps prove effective, the algorithm yields a solution to Problem I. In 
some other cases it may provide a definite negative answer as to the existence 
of a solution. Unfortunately, it has many shortcomings in the general case, and 
these are considered in the following comments on its three steps. 


Step 1: The equality rank M* = rank M has not been proved to be a neces- 
sary assumption. In other words failure to find a solution to Problem I under 
this assumption does not mean that no solution exists. On the other hand, no 
counterexample has been found to-date for proving that the situation M* > M, 
n* < n, rank M* > rank M, and no M* > M with n* < nand such that rank 
M* — rank M may occur, but here no method is available for finding a cover- 
ing machine M*. There is, however, at least one case which necessitates the 
above assumption, namely that of rank M — n — 1 [see Exercise | in this 
section]. 

Still another shortcoming of the first step is that it involves another problem 
to which no general solution is known (although solutions are available for some 
particular cases), namely: Given a polyhedron within the positive unit cube, 
find another polyhedron within the cube with fewer vertices and covering the 
given polyhedron. 


Step 2: With a matrix H' assumed, there may exist an infinity of matrices 
B* satisfying (20), obtainable by linear-algebra methods, but we need not check 
all of them. To ascertain whether the assumed H'' actually leads to a covering 
machine as required, it suffices to check a single B* satisfying (20), and if step 
3 fails here, this signifies that the assumed H'' is unsuitable. This follows from 
Corollary 4.2 and Theorem 3.2. [The reader is advised to attempt a detailed 
proof.] 


Step 3: Under the assumption that rank M = rank M*, the matrix B* found 
in Step 2 is a transformation which perserves the rank of the row space of H*™' 
and thus has a (nonstochastic) left inverse B such that BB* H"' = H'. Mul- 
tiplying both sides of (18) by B, we have BB* A"'(y|x) H"' = BA"(y|x)B* HV. 
Thus one can write the matrix BB* in the form BB* = J + N, where I is the 
n*-dimensional unity matrix and NH™* = 0. Let y be any column vector which 
is a linear combination of the columns of H™*, then Ny = 0. However, since 
all columns of the matrix A"'(y|x) H' are linear combinations of the columns 
of H, BB* AM H™ = (I + N)AV*' H^ = A™ H** and the following equa- 
tion results 

A™*(y|x)H™ = BA"(y|x) B* H* (21) 


Solving Eq. (21) for A"'(y|x) [all other matrices are known], subject to the 
restriction that AM'( y|x) are nonnegative matrices, provides the answer to our 
problem. The system (21), subject to the above restriction is readily reduced 
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to a set of linear programming problems. Failure to find a nonnegative solution 
to (21) indicates that the chosen matrix H™* is unsuitable and a fresh start is 
called for. 

It is thus seen that the above three step procedure is not an algorithm in the 
ordinary sense, for even in the case where the assumption that rank M — rank 
M* is justified there still may be an infinity of H^' matrices satisfying (19) 
which may serve as starting point for Step 1. 


Example 7: Let M be a five state machine [X = (0, 1}, Y = (0, 1, 2}] 


04000 00000 
0400 0 00000 
A(00)—]0.0 0 0 0, 4A(]0)2|0 2010 
00000 040 40 
00000 00000 
000410 4 0000 
00040 00000 
4200 2|00 0 4 O0, 4(01)—2]00 000 
00010 40000 
0000 I 00000 
00000 0040 0 
4+ 04 0 0 00400 
Aj) =}4 0 4 0 03, 4(2])2|]0 02 0 0 
00000 004100 
00000 01 004 
An H" matrix for this machine is 

133 3| ^ 

13 03; h 

H^"—|1 00 1| A 

1023 4| h 

100 1] A 


As the first coordinate of the As is always 1, one may use a three-dimensional 
subspace (again with first coordinate 1). The geometrical representation is given 
in Figure 9. 

Since rank M = 4 = n — 1, we have here that any covering machine M* 
with fewer states has the same rank (see Exercise 1, Section 5) and four states. 
The figure shows that the only possible choice, in this case, for H™” is 


40 Chapter I. Stochastic Sequential Machines 


Figure 9. Geometrical interpretation of HM, 


1 1 0 3] A* 
ye = 1 0 0 4] h* 
10 1 4| A* 
100 1| A* 
and the matrix B* satisfying (20) 
4 04 0 
23500 
B*=|0 1 0 0 
04 40 
000 I 


Let us now try to solve (18) for AM'(1]0), replacing all other matrices in that 
equation with the above ones. This yields (B* H" = H™) 


10240 0 00 0 O][1 24 à 4 
4400 00000]|1304 
0 1 0 olaaa —|o0 + 03 0|1 00 3 
0110 0-1 9 4-0] 1 0-1 
0001 0000 0]/1 0 0 1 
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The first, second, and fifth rows on the right-hand side are zero rows. As for 
the left-hand side, assuming that a nonnegative matrix A™*(1|0) satisfying the 
above equation exists, it is seen that the first and third rows of A*'(1|0) ^^ 
are zero rows [contributing to formation of the first row on the right-hand side]; 
the first and second rows of A™”*(1|0)H™* must be zero rows [contributing to 
the formation of the second row on the right-hand side]; finally the fourth row 
of A¥"(1|0)H™" is a zero row [being identical to the fifth row on the right-hand 
side]. Thus all rows of A"'(1|0)7^* must be zero rows, but this is impossible 
as there are nonzero rows on the right-hand side. The conclusion is that there 
is no machine M* convering M with less than five states, although rank M —4. 


Example 8: Let M be a four state machine [X = Y = (0, 1]], 


4 400 i 0 4 0| 
0000 2 0 40 
A00) =|, . 9 of A(1|0) = | * Me 
8 38 8 8 
4400 4 0 4 0) 
0000 040 4] 
004i 4 0 1 0 4 
A01) = tik a= ia; 
0043 040% 
00% 0 £06 
and H^ matrix for this M is 

14 0] A 

aT E 

1 3$ 2| h 

13 l hu 


As rank M = 3 = n — l, a covering machine M* with fewer states must have 
the same rank and three states. The geometrical representation of this H™ is 
shown in Figure 10, and it is seen that the only possible choice for H™* is 


1 4 0 
H^'—|10 i 
1 1 1 
Using this H™*, two possible matrices B* and B are found as 
100 
DrD 1 000 
Be = ; B=|0 1 0 0 
à 0 ż 
0 -—-1 0 2 
03 £ 


Using these matrices and (21) the machine M* is found 
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h3 


he =h? 
h; =h% 
Figure 10. Illustration to Example 8. 
4 40 $04 
amoo =o 0 O|,  A"(10—]|à 0 4 
110 000 
000 03 4 
AOD =|} + ij AUD =)0 $ ż 
bad 000 


By construction, M* — M, and M* has only three states. 


EXERCISES 

1. Let M* > M be machines with n* and n states respectively, and such that 
n* <n. Prove that rank M < rank M* < n, hence rank M = n — 1 implies 
rank M = rank M*. 


2. Let M* > M be as in Exercise 1, let m* and m be their respective ranks and 
B* the matrix satisfying (17) for these machines. Prove that 


a. m < m* < m + n* — rank B* < n* < n, hence if rank B* = n*, then 


m* — m. 
b. m < rank B* < m + n* — m x n*. 


Hint: Use Sylvester's inequalities and (17). 
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OPEN PROBLEMS 


a. Answer the following decision problem, or prove that it is not decidable: 


Given a machine M, does there exist a machine M* with fewer states than 
M and such that M* > M? 


b. If the problem under (a) is decidable, then construct a finite algorithm for 
finding a machine M* > M with [S"| < |S™|, whenever such a machine M* 
exists. 


c. Construct an algorithm for finding all solutions to the following problem: 


Given two convex polyhedra V, and V, such that V, covers V; [i.e., the ver- 
tices of V, are convex combinations of those of V,], find a third polyhedron V;, 
with a minimal number of vertices, which covers V, and is covered by V. 


6. Minimization of States by Covering—Problem II 


This section deals with the problem of finding a machine M* covered by a 
given machine M and with fewer states. 
Replacing M* with M in (18), we have 


BAM(y|x)H™ = A™*(y|x)BH™ (22) 


Since M is given and so are H™ and A™(y|x), Problem II appears to be simpler 
in the sense that (22) can be used without any a priori assumption as to the 
rank of M*. Thus one can assume any stochastic matrix B having fewer rows 
than columns and try to solve (22) in terms of AV'(y|x) [for all pairs (y, x)] 
subject to the restriction that the matrices AM'( y|x) be nonnegative. 

If no solution exists for a given B, another is assumed and so on. The draw- 
back here is that their number is infinite, and no means has been found to date 
for solving the problem [or deciding that no solution exists] on the basis of a 
finite number of checks. 


Definition 6.1: If H™ is the matrix H associated with a machine M and A any 
nonnegative matrix of suitable dimension, then h™”(A) is the set of all nonzero 
vectors of the form h™”(A,), A; denoting the ith row of A [see Definition 3.3]. 
The following theorem is a geometrical interpretation of (22), enabling us to 
check whether or not a chosen matrix B provides a solution to our problem. 


Theorem 6.1: Let M be a machine. There exists a machine M* < M with 
n* < n states if and only if there exists a stochastic n* x n matrix B such that 


UL PG A" (yp) = conv h™(B) (23) 


A machine M* as above can be constructed effectively if a matrix B satisfying 
(23) is given, 
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Proof: Assume first that M > M* with n* < n. Then (22) is satisfied by 
some n* x n stochastic matrix. Let € = (€,,...,¢,) be any nonzero row in 
AF"( y|x) and € = (65, ..., Cn) the corresponding row of BA"( y|x) on the left- 
hand side of (22). All entries in the first column of H™ are 1 and, since B isa 
stochastic matrix, so are all entries in the first column of BH™. Thus (H^ = 
GBH” > 2: & = 21 o> (6/5; 6) H” = (6/3 č) BH". 

Now £/Y. €, is a probabilistic vector, hence (€/>) £j) BH" is a convex com- 
bination of the rows of BH", or (€/>) £) BH" e conv h(B). On the other 
hand, (C/3, C,) H= h(C) by definition, so that U,, . h(BA“(y|x)) € conv h(B). 

Assume now that there exists an n* x n stochastic matrix B satisfying (23); 
then any row vector in the left-hand side of (23) is a convex combination of 
the points in AM(B). Those vectors [on the left-hand side] are of the form aH", 
where & is a normalizing constant and C is a row in a matrix B.AM( y|x) for some 
pair (y, x). We thus have 

aCH™ = nBH™ (24) 
where 7 is a stochastic vector. 

It is readily seen that (22) is satisfied if the matrices A™”*(y|x) are defined as 
follows: 

a. If a row in BAM(y|x) is a zero row, then so is the corresponding row in 
A™*(y|x). 

b. Let € be a nonzero row in BA™(y|x), then the corresponding row in 
AM'(y|x) is (1/x)z, where z and & are as in (24) the theorem is thus 
proved. | 
Corollary 6.2: Let M be an n-state machine of rank m. Let h,*,...,h,* bea 
set of n* < n points in m-dimensional space, such that 


U hM(A™(y|x)) c conv(h,*, UC. | h,s*) c conv(h,, 8.9559 h,) (25) 
(y, x) 
then there exists an n*-state machine M* < M and M* can be effectively con- 
structed if the points /,*, . . . , h,.* are given. 
Proof: Let B be the stochastic n* x n matrix such that 
h,* 
BH" —|. 
ho 
Since conv(h,, . .. , h,.*) c conv(A,. . . , h„), B can be found effectively. For 
any stochastic matrix B, it is true that 
U A"(BA"(y|x)) = conv oL h™(A™(y|x)) 
(y, x) zz 


so that, for the above B we have 


B. State Theory and Equivalence 45 
U hM(BAM(y|x)) c conv( U A" (A"(y|x)) 
x) (yx) 
c conv(h,*,..., h,.*) = conv AM(B) 


by the definition of B. 

Equation (23) of Theorem 6.1 is thus verified for the above matrix B, and 
the corollary follows. 

A particular case of the above corollary would be when the points (h,*,..., 
h,.*) are a subset of the points (h, .. ., ^,). In this case B would be a degen- 
erate stochastic matrix. We therefore also have the following: 


Corollary 6.3: Let M be an n-state machine. If there exists a subset (h,*,..., 
h,.*) of the set of points (h,, . . . , h„) such that 


Y h™(A™(y|x)) c conv(A,*, ..., h,.*) (26) 
(y, x. 
then an n*-state machine M* < M can be constructed effectively. 


The above corollaries may help in solving our problem in some particular 
cases. On the other hand, the following remarks are in order: 


1. The conditions specified in the corollaries are only sufficient conditions, 
and a solution to the covering problem may exist even if the conditions do not 
hold for a given machine (see Exercise 1 at the end of this section). 


2. While the conditions of Corollary 6.3 are decidable (prove this fact), this 
is not known to be true for those of Corollary 6.2. In fact the latter involve 
the unsolved problem mentioned on p. 38. 


Example 8 (continued) 

We shall show, using a procedure based on Theorem 6.1, that there exists no 
machine M* < M with n* < 4 states, where M is the machine in Example 8. 
This will show that the second covering problem is nontrivial in the sense that 
a solution is not always available. 

We first arrange the set Uy, ,, h@(A™(y|x)) in tabular form (Table III) where 
5,,...,54, are the states of M, and h,,...,h, the rows of H™; if a row corre- 


Table III Distribution of the set Uy, x) hM(AM(y|x)) according to states and matrices in 


{AM(y|x)} 
T hM(AM(0|0)) hM(AM(1]|0)) hM(AMY(0|1)) hM(AM(1|1)) 
5 30, + h) 20, hy) 0 $05 + ha) 
Sy 0 i0 + h3) $05 + hj) Ahr + ha) 
53 iQ + hy) 304 + hy) Ah, + hy) i05 + ha) 


54 i0, + hj) i04 + hs) i05 + hy) i05 + hy) 
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sponding to s; in some matrix A™(y|x) is a zero row, then the corresponding 
entry in the table is zero. Let B be any stochastic matrix with m < 3 rows. 
The table 7", corresponding to the set Ug,» AV(BA^(y|x)), has only m rows. 
A nonzero entry in a column of 7" will be a convex combination of the nonzero 
entries in the corresponding column of T. [This follows from the definitions.] 
Since all nonzero entries in a column of T are identical, any convex combina- 
tion of them results in an entry having the same value as the combined entries. 
We shall consider two cases: 


a. The matrix B [which has m < 3 rows] has nonzero entries in two, three, 
or all of its columns. The entries in the rows of T’ are, in this case, convex 
combinations of the corresponding entries in at least two rows of T, hence, 7" 
has nonzero entries in all its columns, which are identical to the nonzero entries 
in the corresponding columns of T. This implies that 


| U AM(BA™(y|x))| = number of different nonzero entries in T" 
x) 


= number of different nonzero entries in T 
=| U AÇAC) 


On the other hand, since B has m < 3 rows, we have that the set h“(B) has 
at most three different points. It is seen in Figure 10, where the points in the 
set Uc. 4™(A™(y|x)) are denoted by 1, . . . , 4, that (23) cannot be satisfied, 
since the set u,,..., u, cannot be covered by a convex closure of three points 
only, inside conv(/, . . . , An). 


b. If the matrix B has nonzero entries in one column only, then the table 
T' has nonzero entries in at least three columns. In this case the set 
Uc, h“ (BAM(y|x)) contains three of the four points u,,...,u, at least. On 
the other hand, the set Ug,» #“(BA™(y|x)) contains only one point, since B 
has only one nonzero column, and (23) cannot be satisfied in this case either. 


Example 9: Let M be the four-state machine (X — Y — (0, 1]) 


i $ 4 0 0 i i 0| 

3 3 3. 0 0 4 10 
Alo) =|" 7? 'T AQ | T T 

i $ s 0 0 $3 € O0 

00 00 03 7% 0| 

00 0 0 i 0 0 1i] 

0 $ à i s 0r 0. 
40)=|, 3 a at DS) 9 og 

16 316 8 Ey $8 

EE E å 1 0 0 1j 


A matrix H^ for this machine is 
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1 4 0] A 
MALE E E 
1 2 4) h 
103] h 


Figure 11 shows the sets (/4,..., h,) and Uu, AM(AM(ylx)) = (us, . . . , U4). 


* 
h Eh, 


Figure 11. Illustration to Example 9. 


The reader is advised to compute these points and verify that their position in 
the figure is correct. It is seen that the choice h,* = I, h,* = u, h* = h, 
satisfies the condition of Corollary 6.2. 

The resulting matrix B is 


e 
e 


0 
0 
00 1 


and the resulting machine M* is [The reader is advised to verify the results by 
actual computation. ] 


nu 
n= 


1 
B=]|0 
0 


4 4 0 040 
AM'(0) =|; ~ 0,  A4"()-—lo 2 0 
0 0 0 0 1 
0 0 0 4 0 4 
AV'(00) — |0 $& fe —4"ülD)-—|$ 0 f 
0 1 4 à OO 4 
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EXERCISES 
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1. Show that there exists a machine M* — M with n* = 4, where M is the 
machine in Example 7. Hint: Use the matrix 


B= 


1 


000 
0100 
00 10 
000 1 


oo o0 


e 


Note: Show that the machine M in Example 7 with the matrix B above, does 


not satisfy the condition of Corollary 6.2. 


2. Is the configuration M* < M < M* with n* < n > n* possible? Hint: 
Solve covering Problem I for the machine in Example 9. 


3. Let M be the (deterministic) four-state machine [X = (0, 1, 2}, Y = (0, 1j] 


[1 


— 


A(0]0) = 


A(1|0) = 


Lı 


1 
0 
AOI) = | 


0 


0 


o o0 c ccc ccoco 


0 0 


e 
© 


, 


> 


o oc ccc occo 
ooo ooo c occ 


0 0 


A(0]1) = 


A(]1) = 


A(1\2) = 


[1 


[1 
[0 


1 


0 


o occ oo oococcooco 


0 


0 


o oco oooo ooo oo 


0 


07 


oo 


a. Find an H™ for this machine, and show that it is a reduced-form (and 


minimal state form) machine. 


b. Show that there exists an n*-state machine M* < M with n* < n states 
[compare Exercise 2, Section 3]. 


4. Let M be a machine [X = (0, 1}, Y = (0, 1, 2]] 


0 


T 


A(0|0) = 


oo coc O 
oo On 


oo 000 


0 0 


0, 
0 
0 


oc oc o0 


A(1|0) = 


oo 009 Oo 


oo oooO 


oO ooo oo 


y Oo coc 


Oo w- 


oo ooo oo 
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00001 10000 
00001 00000 
42002100 00 4, £AO1)=}0 0 00 0 
00001 140000 
$0400 00000 
00000 
00400 
Ai1D)-2|0 0 4 0 OL — A(t) = 420) 
00000 
00000 


a. Find a matrix H™ for this machine and show that it is minimal-state form, 
strongly connected, has equivalent strongly connected nonisomorphic ma- 
chines, and its rank is smaller than its number of states. 


b. Show that no machine M* exists with less than five states and such that 
M* > M or M* < M. 


5. Consider the following. 


Definition: A sequential pseudostochastic machine is a quadruple M — 
(S, X, Y, {A(y|x)}) where all elements in the quadruple are as in Definition 1.1, 
but the entries in the matrices A( y|x) may be negative, positive, or zero. 


a. Prove the following 


Theorem: Let M be an n-state machine of rank m < n. There exists a pseudo- 
stochastic sequential machine M* with m states such that M and M* are equiva- 
lent, (equivalence being defined in the usual way). 


b. Find the four-state pseudomachine equivalent to the one defined in 
Exercise 4 above. 


OPEN PROBLEM 
a. Answer the following decision problem, or prove that it is not decidable: 


Given a machine M, does there exist a machine M* with fewer states than 
M and such that M* < M? 


b. If the problem under (a) is decidable, then construct a finite algorithm 
for finding a machine M* < M with |S^'| < |S^| whenever such a machine 
M* exists. 
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7. Minimization of States by Covering— Problem III 


In this section, Problem III [i.e., that of finding an initiated machine (M*, z*) 
having a minimal number of states and equivalent to a given initiated machine 
(M, x)] is reduced to the two problems considered in the previous sections. 

Let (M, 2) be a given initiated machine as in Exercise 5, Section 1, we can 
construct a matrix G^? whose rows are of the form zt(v, u) for some pairs 
(v, u) and linearly independent, and any vector of the form Z(v, u) is a linear 
combination of them. We shall now prove the following: 


Theorem 7.1: Given an initiated machine (M, mr) and a machine M*, there 
exists a stochastic vector z* such that (M, zx) = (M*, z*) if and only if there 
exists a stochastic matrix B* such that 
B*[K^*'] E G™ KM] (27) 
Proof: Assume first that (27) is satisfied, and let z* be the first row in B*. 
Then, since the first row of G^? is z, we have that the first entry in a column 
of the form G^? gM(v|u) on the right-hand side equals p,“(v|u). The corre- 
sponding value on the left-hand side is the first entry in the column B*7" (vlu) 
which equals p,."'(v|u), so that (M, z) = (M*,z*). Assume now that there 
exists a vector z* such that (M*, z*) = (M, n) then z*[K'] = z[K"]. This 
implies that z* AM'(v|u) K4*] = zAM(v|u)[ K^], since the columns of the matrix 
A™(v|u)[K™] are a subset of the columns of K^ and those of A^'(v|u)[ K*'] are 
the corresponding columns in [K™*]. But any row vector z in G™” is of the 
form az A" (v|u), where a is a normalizing constant, hence for any such vector 
there exists a corresponding vector az* AM'(v|u) = z* such that z*[K'^] = 
A[K^]. The matrix B* whose rows are the vectors z* corresponding to Z of 
G^? satisfies (27), and the theorem is proved. | 


Theorem 7.1 above reduces the third problem to one of finding a machine 
M* having fewer states than the given initiated machine (M, 7), and such that 
the relation 

B*[kor] = [K] (28) 
holds for some stochastic matrix B*, where [K*”] denotes the matrix 
G™.™[K™]. For tackling this problem, a relation similar to (18), which is 
equivalent to (28) can be derived. 

Let H™” be a matrix having the following properties: 

1. The columns of H™” are columns of [K>]. 

2. The columns of H^? are linearly independent, and any column in [K™”)] 
is a linear combination of them. 

3. The columns of H ^^? are the first columns of [K^] satisfying 1 and 2 
above. 
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Clearly, the columns of H^? can be chosen from those of the form 
j^ ?(v|u) = GO ^? n(v|u) with Kv, u) < n — 1. To prove this, we note that 
any column in [K^?] has the form 


GM AM ol) = GHP Y aq = Y a GPM 


where 4,“ are the columns of H™ and a, constants. It follows that the matrix 
H™”) can be effectively constructed. 

Denote by H'^?(y|x) the matrix such that its ith column is z/^?( yu|xv) if 
the ith column of H^? is 5j (ulv); likewise, K™:”(y|x) and [K"(y|x)]. We 
seek a matrix A(y|x) such that 


A(y|x)K?^?] = [K™(y|x)] (29) 
But Eq. (29) is satisfied by any matrix A(y|x) satisfying also 
G Ax) = A(ypoGoe? p (30) 


for if A(y|x) satisfies (30), then 
[K^**()] = Ger" (p9] = 6%” AQI)LE 
= A(x)" [K¥] = AGporke] 


by definition, and bearing in mind that the columns of [K^] are linear combi- 
nations of the columns of H^. 

Now Eq. (30) has at least one solution [there may be more], being satisfied 
by any matrix A(y|x) satisfying also 


GM A(y|x) = Ax) (31) 


Eq. (31) has a (unique) solution, for the rows of the matrix on its left-hand side 
G(^? A(y|x) are by the definition of G^? linear combinations of the rows of 
the latter, and these are linearly independent. 

Using the above definitions and a method similar to that used in the proof 
of Theorem 3.1, one can prove the following: 


Theorem 7.2: Given an initiated machine (M, 2) with n states, there exists an 
initiated machine (M*, z*) with n* < n states and such that (M, x) = (M*, 2*) 
if and only if there exists an n*-state machine M* and an n* x n* stochastic 
matrix (n* < n) B* satisfying the relation 


B* AM (y|x) H^ = AU (y|x)B* H* (32) 


The proof of this theorem is left to the reader. Theorem 7.2 reduces the third 
problem to the first covering problem [with H^? replacing H™], so that all 
considerations in Section 5 are valid here. Since no general procedure is availa- 
ble for solving the first problem, the above theorem, together with Section 5, 
yields solutions to the third problem only in particular cases. 

We shall therefore also consider some additional approaches, based on Section 
6. 
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Theorem 7.3: Let M >`M* be two machines with B[K^] = [K'], and z a 
distribution for M such that h™(z) € conv h™(B). Then, there exists a distribu- 
tion z* for M* such that (M, x) = (M*, z*). 


Proof: Since h"(z) € conv h"(B), there exists z* such that tH” = z* BH". 
Hence z and z*B are equivalent vectors for the machine M, so that zz" (v|u) = 
z* By™(v|u) for all pairs (v, u). It follows that 


pr (vlu) = nu" (vlu) = z* Buy" (oju) = n*y (olu) = pit (lu) 
for all pairs (v, u), and the theorem follows. || 


Corollary 7.4: Let (M, z) be an initiated machine with n states. There exists 
an initiated machine (M*, z*) with n* < n states and such that (M, zx) = 
(M*, n*) if either condition 1, 2, or 3, as well as condition 4, holds: 


]. There exists a stochastic n* x n matrix B such that 
U AM(BA"(y|x)) c conv h™(B) 
(y, x) 
2. There exists a set of n* points h,*,..., 4,.* in m-dimensional space (m = 
rank H™) such that 
U A"(A^*(y|x)) c conv(h,*,..., A,.*) c conv(h,,..., hy) 
(y,x) 


3. There exists a subset h,*,..., h,.* of the set of points /,,..., A, such that 
U hM(A(y|x)) c conv (hy*,..., h,-*) 
w, x) 
4. Let B be as under condition (1) if that condition is satisfied, or otherwise 
a matrix defined by 
h,* 
BH™ =| - 
h,.* 
if either condition (2) or (3) is satisfied. Then A"(z) € conv h™(B). 
Proof: By Theorems 7.3 and 7.1 and Corollaries 6.2 and 6.3. 
Example 10: Let (M,z) be an initiated machine [X = Y = (0, 1], z = 
(040) 


0000 i 4 4 0] 
1 1 00 0000 

A(00) =|? * i A(1]0) = 

O= E 00 OE hee o 
0 0 0 0 i i 4 0l 
ro 0 0 0 00 4] 
0000 $0025 

A(0]1) = , AM) = 

Ua Pa a ae o 
i011 000 0| 
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A matrix H^ for this machine is 


100 
Pre ae d 
111 
101 


o_ = © 


The points in Uy, h(A(y|x)) are (4, 0), (3, 3), G, $), and (0, 4) (first coor- 
dinate omited), h(x) = (£, 1). 

The second and fourth conditions of Corollary 7.6 are satisfied if we choose 
h,* = (4,0), h;* = (1, 1) and h,* = (0, 4). (The reader is advised to draw an 
illustrating sketch.] 

The resulting matrix B is 


0 


1 
0 i 


oc 


B= 


oO w- 
> on- 


n 


and the required initiated machine is found to be: z* = (45 $ 45), 


[4$ 0 0 4 1 0] 
4"(00)—|1. 0 0,  A*(10-—|0 0 0 
lo 0 0 4 3 0] 

(000 0 0 1 
4"(01)2]0 2 4,  A"*(j)D-|0 0 0 
10 4$ 4 0 0 i| 


The reader is advised to verify the results by actual computation. 


EXERCISES 


l.a. Find the matrices G, H™.™ and A™”(y|x) for all pairs (y, x) where 
(M, n) is as in Example 10. 


1.b. Find a three-state initiated machine (M*, z*) equivalent to (M, z) in Ex- 


ample 10 and different from (M*, n*) there, using a procedure based on The- 
orem 7.2. 


l.c. Show that the third condition of Corollary 7.4 does not hold for Example 
10. 


2. Prove that if (M*, z*) ~ (M,z), then the number of states of M* is not 
smaller than rank H^, 


3. Give a proof of Theorem 7.2. 
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4. Consider the following machine [X = Y = (0, I] z = 4445), 

0000 0000 
0 4 0 i 0 0 0 

4o)=|° 2° 2, qp]? ? 
+ 0 3 O 0421: 
0000 044l 
03203 0i ii 
0000 i. 

M(1|0) = , Aün- 0 1i 
0000 0000 
i010 0000 


Show that the third and fourth conditions of Corollary 7.4 apply to this ma- 
chine, and find a two-state initiated machine (M*, z*) equivalent to (M, z). 


5. Same as 4, but z = (1, 0, 1, 1) and (M*, 2*) has three states. Is further re- 
duction of states possible in this case? 


6. Consider the initiated machine (M, x) whose defining matrices are 


[1000 [0000.0 
1000 0000 
A(0|0) = »  A(1 0) = 
0000 0100 
10 000 [0 1 00 
[0010 [00 0 0 
0000 0 0 1 
(ID ° 0 10 (D 000 
0000 [0 00 1 


and z — (1000). 
Show that rank H^? — 3, but there exists no initiated machine (M', 2’) 
equivalent to (M, zx) with fewer than four states. 


OPEN PROBLEMS 


a. Answer the following decision problem, or prove that it is not decidable: 
Given an initiated machine (M, x), does there exist an initiated machine 

(M*, n*) with fewer states than (M, z) and such that (M, z) ~ (M*,2*)? 

b. If the problem under (a) is decidable, then construct a finite algorithm for 

finding an initiated machine (M*, z*) ~ (M, n) with |S™’| < |S"| whenever 

such an initiated machine (M*, z*) exists. 
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C. INPUT-OUTPUT RELATIONS 


1. Definitions and Basic Properties 


Definition 1.1: A probabilistic input-output relation is a function p(v|u) whose 
domain is the set of all pairs (v, u) of input-output sequences (of equal length) 
over respective finite input and output alphabets X and Y, whose range is the 
interval [0, 1], and subject to the restrictions: 

1. plà) = 1 

2. 3 P(vy|ux) = p(v|u) for all x € X, the summation is over all y € Y^ 

Throughout this section the term “relation” refers to a probabilistic input- 
output relation unless otherwise specified. 

Remark: Note that (1) and (2) in Definition 1.1 imply that 

3. Y, p(ylx) = 1 for all x e X 


Definition 1.2: An initial segment of length n of a relation p (denoted by [p],) 
is the part of p which corresponds to input-output pairs of length not exceed- 
ing n. Any relation p with [p], as its initial segment is a completion of [p],. 


Notation: P(X, Y) denotes the class of all relations over the input and out- 
put alphabets X and Y. 


Definition 1.3: The left-hand derivate of a relation p € A(X, Y) with respect 
to the pair (u’, v’) [denoted by p,,-,,;] is the function 


p(v'v|u'u) ] "T 
Pw ,'vlu) = pow)’ if p(v lu) 0 


the zero function, otherwise 
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Theorem 1.1: The class Z(X, Y) has the following properties: 

1. If p is a finite convex combination of function in A(X, Y) such that p = 
Liai Aw, hri A, = 1,4; 20,/— 1,2,...,n, and (x, y) a pair such that 
2s ApCy|x) # 0, then Di = 25 Hi Pitx,n where a = Apylx)/52 A,p(ylx), So 
that Y; 4; = l and uj > 0 fori = 1,2,...,n 

2. If (x!, y!) and (x?, y?) are two pairs and p € P(X, Y) is a relation such 
that p(y'|x') # 0 and p(y! y?|x' x?) z& 0, then (pu s)us s = Pitt yy 

3. The class Z(X, Y) is closed under convex combinations of its elements. 

4. If p € A(X, Y), and (x, y) is a pair such that p(y|x) + 0, then pi, € 
P(X, Y). 

Proof: (1): Under the given conditions, we have that 


i ApClxu) . píyvlxu) _ 
Pix, (tlu) = "APO — X D; TOD = $5 LPa 


2. It follows from the definition that 


— Piw (V oxu) 
xl, yt J Lx? vu) = ooo 
(Pix sss Col) Pus P12) 
PO! y?v|x! x?u)/p(y'|x') 
pr! yx! x*)/p(y'|x") 
— ty vx xu) _ 
py! yx! xi) = Dias yy lvlu) 

Proof of properties (3) and (4) is left to the reader. 

Relations induced by stochastic sequential machines are characterized by 
Theorem 1.2: Let p,,..., p, be a finite set of functions in A(X, Y). There exists 
an n-state machine M such that p, = ps,” if and only if for every i, and for every 
pair (x, y) such that p(y|x) + 0, the function p; isa convex combination of 
the functions p;. 


Proof: The “only if" part is straightfoward and its proof is left to the reader. 
Assume now that the conditions of the theorem hold. If there exists a machine 
M such that p, = pi^, then p(y|x) = ps" (y|x) = 5, A" (y|x)m and 


ps (yoxu) _ S, AMCy|)" (vu) 


Pes) = p AMON 
— by ay + p Qo) _ 23, ai yl) elu) 
2a» »NEXCE 


Thus the machine M must satisfy the equations 
> aM(y|x) = plo) (33) 


C. Input-Output Relations 57 


pi = RETE) (34) 


But we also have by the conditions of the theorem that 


Pitz,(0|u) = x Aupylu) (35) 
Combining the three equations, we have 
2 Apu) = X a Crbop olu) pl) (36) 
or 
PA y|x) x A, pj(v|u) = i a, (y|x)p(v|u) (37) 
A possible but not necessarily unique solution to (37) is 
alyx) = pl); (38) 


Let M be the machine whose defining matrices are given by (38). We now 
prove by induction that for this machine, ps.“ = p; as required: 


1. It follows by construction that p;,“(y|x) = p(y|x),i = 1,2,...,n. 


2. Assume that pi(v|u) = p(v|u) for all i and all pairs (v, u) with (u, v) < 
k, and let (u, v) be any such pair; then 


pi (yojxu) = 31a hopirGvlu) = X a Gro. vlu) 
and the latter by (37) and (35), equals 


píyv|xu) _ 
py») 


where /(xu, yv) = k + 1. Thus p¥ = p, and the proof is complete. 


PAY|X)Pitx, solu) = pyx) pi yo|xu) 


Corollary 1.3: Let p € A(X, Y) be a relation. If the set of all nonzero deriv- 
ates of p is finite and contains 7 different relations, then there exists an n-state 
machine M such that p = pi’. 


Proof: Let p, Duo - «Pur, be the set of all nonzero different derivates 
(including p itself which is the derivate with respect to the pair (A, A)) of p. 
Then, any other nonzero derivate of p is included in this set, hence the condi- 
tions of Theorem 1.2 hold. | 


Remark: In Section B3, Exercise 9, we introduced the definition of an ob- 
server/state-calculable machine. These machines have at most one nonzero 
element in each row of their matrices. Now it is readily shown that if M is 
such a machine and p¥ is considered as a relation in A(X, Y) for the appro- 
priate X and Y, then this relation has only a finite number of nonzero different 
derivates. To prove this claim, we note first that 
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piyv|xu) _ 

pi yix) 
by (10) (Section A,1), and therefore the number of different nonzero relations 
of the form pi4,,, equals that of nonzero vectors of the form 5,(u, v). If M is 
an observer /state-calculable machine, then by the definitions any vector of the 
form $(v|u) = $,4"(v|u) has at most one nonzero entry. Thus any nonzero 
vector of the form 5,(u, v), which is s((v|u) multiplied by a normalizing constant, 
is a stochastic degenerate vector and there exists only a finite number such vec- 
tors. Furthermore, a closer look at Corollary 1.3 above and its proof shows that 
the machine M in that corollary can be chosen to be observer /state-calculable, 
for the states of M are identified with the derivates p, Pian, --- , Purus and 
the transition between these states is deterministic. We thus have the following 
characterizing: 


Theorem 1.4: Let p € A(X, Y) be a relation. If and only if the set of nonzero 
derivates of p is finite, then there exists an observer/state-calculable machine M 
such that p = pH. 
Another corollary to Theorem 1.2 is: 

Corollary 1.5: Let p € A(X, Y) be a relation. There exists a machine M and 
an initial vector z for M such that p = pM if and only if there exists a finite set 
of functions p,,. . . , p, in P(X, Y) such that p € conv(p, . .. , Pa), and for every 
i, and every pair (x, y) such that p(y|x) z^ 0, also Pax, € conv(p;, . . . , p,). 


phis (vu) = Pico) 


The proof if straightforward and is included in the exercises below. 


EXERCISES 

1. Prove properties (3) and (4) in Theorem 1.1. 
2. Prove Corollary 1.5. 

3. Prove the “only if” part of Theorem 1.2. 


2. Compound Sequence Matrix 


Definition 2.1: Let (u, 01), (u V2), . - - (Uns Vn), (uvis Qu v2)... Uns Un) 

be a set of 2n pairs of sequences and p € A(X, Y) a relation. The matrix 
P= [p(v,v; juu) 

is then called a compound sequence matrix, and its determinant a compound 

sequence determinant. 

Definition 2.2: The rank r(p) of a relation p is the maximum among the ranks 

of all compound sequence matrices which can be formed from p, or + °° if 

no such maximum exists. 
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Corollary 2.1: If p € A(X, Y) is a relation such that p = p," for some ma- 
chine M and some distribution z, then r(p) = rank (M, z). 


Proof: Let ,(M, x) be a compound sequence matrix for p, "V. Any entry in 
p(M, zx) has the form p,"(v,vj|u;u/y = n(vjju)m(v;|u;) so that the matrix 
p(M, 2) can bewritten in the form ,(M, zx) = G'H, where G' is the matrix 
whose rows are the vectors z(vj|u) and H the matrix whose columns are the 
vectors 7(v;|u;). Let G be the matrix whose rows are the vectors (u,|v;)v corre- 
sponding to z(v,\u,) in G’. Since the rows in G' differ from those in G by a 
multiplicative constant, obviously rank G' = rank G, so that rank p^? = 
rank (GH). It follows that 


rank p = rank p," = max ,(M, z) rank (M, x) = max rank (GA) 
G,H 


= rank (G H^) = rank H™” = rank (M, x) 
(See Section B.7 for the definitions). || 


Lemma 2.2: Let p be a relation of finite rank n, and P a compound sequence 
matrix for p of rank n. Another compound sequence matrix P’, also of rank n, 
can be found such that the pairs (u,, v,) and (u,’, vj) in the sequence defining 
P' satisfy 
(A, a) = (1, vi) = (uv) (39) 
Proof: Let (u,, 0;),« « «> (Uns Un), (U, V1’), «+ + > (Uns Un) be the sequence defin- 
ing P, and (u, v), (wu, v') any two input-output pairs. The following determinant 
then equals zero 
P(v,0' juu) 
E =0 (40) 
pv,v uu ) 
P(vv, uu’)... p(vr, |uu,’) — p(vv'luw) 
since P is a regular compound sequence matrix of maximal possible rank n. 
Substituting u = v = w = v' = Ain (40) and expanding about the last column, 
we obtain (p(A|A) = 1) that |P| + >) @,|P,| = 0 where the a, are numerical 
constants and P, is derived from P by replacing the ith row with p(w, Ju) - - - 
p(v,'|u,). Thus one of the P; is a regular matrix and P in (40) can be replaced 
with P, Using the same argument for the new determinant (40), expanding 
this time about the last row, we find that there exists a regular matrix (Pj) 
derived from P; by replacing its jth column with 


piu) 


p(v,|u,) 
We thus have a regular matrix (P,)/ derived from P by replacing (u; v;) and 
(uj, v/) with the pair (A, 4). Appropriate reordering of rows and columns yields 
a compound sequence matrix with the required properties. | 
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EXERCISES 
]. Prove that any relation p of rank 1 has the property 
pCw'|uw) = p(vlu)pQv'|u'). 


2. Prove that the set of relations of finite rank < n is closed under convex 
combinations of its elements. 


3. Prove that the set of relations of finite rank < n is closed under left 
derivation. 


4. Show that Theorem 1.2 can be refined as follows. 


Theorem: Let p,,..., p, bea finite set of relations in A(X, Y). There exists an 
n-state machine M such that p, = p if and only if for every i rank p, < n, 
and for every i and every pair (x, y) such that p(y|x) 40, the segment 
[Pite,tlan-1 18 2 Convex combination of the segments [P;}2,-1- 


5. Refine Corollary 1.5 using Exercise 4 above. 


3. Representability of Relations by Machines 


Expanding the determinant in (40) about its last column, we obtain 
p(wv'|uu') = »» a(v|u)p(viv,'|uu') (41) 


where the a(v|u) are functions of the entries of matrix P and of the values 
p(vv;|uu;). Replacement of v, v', u, v/ with v,v, v/v, uu, u'uj respectively in 
(41) yields 


p(v,vv' vj [uuu uj) = 2 ae(0,v]04 4) PCr v'vj lupu uj) (42) 
or, in matrix form, 
P(ov'|uu^) = A(viu)PQ'|u') (43) 
where 
P(v|u) = [P(v,vv/|uuu;)] ^ and  — A(v|u) = [a/(v,v|u;u)) (44) 
(P(A|A) = P and A(A|A) = I = the identity matrix.) In particular, we have 
P(v|u) = A(v|u)P (45) 
and 
P(y|x) = AQP or Ay) = POQpo9P^ (46) 


Thus if P (which is regular) and P(y|x) are given, A(y|x) is obtainable from 
(46). Substitution of (45) on both sides of (43), with the common (regular) 
matrix P cancelled out, yields 
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A(vv'|uu) = A(v|u)(v'|u’) (47) 
The above formulas lead to the following 


Theorem 3.1: Given a compound sequence regular matrix P of maximal rank 
n for a relation p. A pseudostochastic sequential machine M with n states (see 
Exercise 5, Section B,6) and an initial distribution z can be found effectively, 
such that p = p,™. 


Proof: Using Lemma 2.2, construct another compound sequence matrix P' 
satisfying (39). Compute the matrices A(y|x), using (46). [It is assumed that 
the matrices P'(y|x) are available.] Let Q be any regular matrix of order n such 
that Q7 equals the first column of P and the first row of Q is nonnegative [which 
implies that it is a probabilistic vector, as the 1, 1 entry in P is 1]. Define 


A™(y|x) = Q^! A(y|x)Q (48) 
and z = the first row of Q. 
Let M be the pseudomachine whose matrices are A”(y|x) with initial distri- 
bution z. Then by (47) we have that 


A" (vlu) = Q^! A(vlu)O 


hence, P," (v|u) = nA™(vjujn = nQ! A(v|u)Qmg = s, A(v|u) P, with P, denoting 
the first column of P[xQ-' = 5, = (10 --- 0), as z is the first row of Q by 
construction]. But A(v|u)P = P(v|u) by (45), hence A(v|u)P, is the first column 
of P(v|u), so that P," (vlu) = s, A(v|u) P, = 5, P,(v|u) = the 1, 1 entry in P(v|u) = 
p(v|u) as required. I 


Theorem 3.2: Let p be a relation of finite rank < n such that the values p(v|u) 
are recursively computable. [In other words, p(v|u) with /(v, u) = k is obtain- 
able effectively from the values p(v'|u^) with /(v', w) < k — 1.] Then a regular 
compound sequence matrix of order n can be formed from a segment [p]zn-2 
of p. 


Proof: If p is a relation of finite rank < n then by definition there exists a 
regular compound sequence matrix of maximal rank < n for P. By Theorem 
3.1 there exists a pseudomachine M with < n states and a distribution z such 
that p = p,". Recalling the construction of matrices H^ [Section B,1] and 
G^? [Exercise 5, Section B.1] we see that it is not affected by the fact that M 
is a pseudomachine. Thus H^ and G™™ exist for a pseudomachine M such 
that the values p," (v|u) in the former and p," (v|u) in the latter correspond to 
pairs of length < n — 1, and rank GU^? HM = rank H™” < n. But the entries 
in H™ are of the form 


Av, uy" lu) = an(v|u)y™(o'|u’) = ap," (vv'|uu') 


where a is a normalizing constant depending only on the row z(v, u) of G^? 
and /(vo'|uu^) < 2n — 2. The required matrix P can thus be derived from the 


62 Chapter I. Stochastic Sequential Machines 


above H», which is regular for pseudomachines [prove this fact!], by division 
of its rows by an appropriate constant. i 


Corollary 3.3: Let p be a relation satisfying the conditions of Theorem 3.2. A 
pseudostochastic machine M with <n states and an initial distribution can then 
be found effectively such that p = p,™. 


Proof: By Theorem 3.1 and 3.2. I 


Corollary 3.4: Let p be a relation satisfying the conditions of Theorem 3.2. 
Then its segment [p],,-; uniquely determines the whole relation. 


Proof: The initiated pseudomachine (M, z) such that p = p,” is determined 

by [p], as the required compound sequence determinant P is obtainable from 
[p]; and the matrices A(y|x) [see (46)] depend on P and P(y|x), which can 
be derived from [p],,.., by (44). 
Corollary 3.5: Let p be a relation of finite rank n satisfying the conditions of 
Theorem 3.2. If there exists a true initiated machine (M, z) with n states and 
p = p,™, then there exist also a compound sequence determinant P of rank n 
for p and a nonsingular matrix Q such that AM(y|x) = Q^! A(y|x)Q where 
A(y|x) is defined as in (46). 

Proof: Under the assumptions, P may be chosen as the matrix P^? defined 
in the proof of Corollary 2.1. Thus P = P^? = G™”™ H™, [See definitions 
in the proof of Corollary 2.1] and 


A(y|x)G'o^? H™ = A(y|x)P = P(y|x) = GO” AM (y|x) HV 
As M has n states and p,” is of rank n, H^? has n independent columns and 


n rows. Thus H^? together with G'^^? and H^ are regular matrices, imply- 
ing that (G:?)-! A(y|x)G'?^? = AM(y|x) as required. i 

Example 11: Consider the relation (assumed to be of rank 2) in the follow- 
ing table (X is a single-symbol alphabet and is omitted): 


v A O 1 00 OL 10 11 000 001 010 Ol! 


1 3 7 1 1 T 7 3 7 1 1 


pv) To to 4 20 20 20 340 40 wW W 


Setting u, = A, u, = 0, u,’ = A, uj = 1, we have 


P= 
3 1 
10 20. 


which is a regular matrix of rank 2. Proceeding as in Theorem 3.1 we have 


Ie o [- — 4 
mo-[* | xpeptc*p onera A 
L 8 4 


1 
4 40 20 40. 


m 
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49 - POP - o] «n pes [9 f 
« 0 ds ds 


There are many possible alternatives for matrix Q. One such choice is 


o-[t d 
foh 


so that 


ak 
I 
fL] 
Nia Ojo 
| 
A ala 
EI 


and finally 
1 0 
4"(0) = Q7 4()0 = l 
0-4 
+ 0| 
3 0j 
z = (4,4) and (M, n) = P. Verification is left to the reader. 
Let p be a relation. If there exists an initiated (pseudo-) machine (M, z) such 
that p = p,^, then (M, z) is said to represent p and p to be representable by a 


(pseudo-) ISSM. 
We are now able to sum up the situation as to the representability of relations. 


A™(1) = Q^ A0)Q = | 


a. The following theorem is readily proved for the general case: 


Theorem 3.6: A given relation p is representable by a pseudo-ISSM if and only 
if p is of finite rank. 

The “if” part is meaningless unless it is specified how the relation is “given.” 
It is therefore assumed that it is given such that the values p(v|u) are recursively 
computable [as in Theorem 3.2]. 


b. If a relation is given as above and known to be of finite rank, then it is 
also known to be representable. Still, so long as no bound is set on that rank, 
the latter cannot be computed, nor can a representation be found for it [see 
Exercise 5 at the end of this section]. 


c. If a relation is given and a bound set on its rank, then using Corollary 
3.3, a representation can be effectively found for it, but the result is, in general, 
an initiated pseudomachine (with number of states equal to the rank of the 
relation). 


d. Given a relation p which is known to be of finite rank — n, no effective 
answer is known as to whether p is representable by a true ISSM. This last 
problem can be further subdivided as follows: 
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Given a relation p of rank n, is p representable by a true ISSM (M, x) such 
that rank M — n — the number of states of M? 


(d-1). If it is, formulate an effective procedure for constructing the represent- 
ing machine. 


(d-2). If case (d-1) does not apply, then is p representable by a true ISSM 
(M, x) such that rank M = n but the number of states of M exceeds n? If it 
is, then formulate an effective procedure for constructing (M, 7). 


(d-3). If neither (d-1) nor (d-2) apply, then is p representable by a true ISSM 
(M, zx) such that rank M > n? [Note that rank (M, n) = rank p may not equal 
rank M.] If it is, then formulate an effective procedure for constructing (M, 7). 


It is readily seen from examples that case (d-1) is not empty. It can also be 
shown that case (d-2) is not empty either [see Exercise 8 at the end of this sec- 
tion]. The author is not aware of any example proving that (d-3) is feasible, 
but there is no reason why it should not be. 


EXERCISES 
1. Prove that any relation of rank 1 is representable by a true ISSM. 


2. Discuss the implications of Exercises 2-5 in Section 2 with regard to the 
decision whether a given relation is representable by a true ISSM. 


3. Consider the relation given in the following table [X is a single-symbol al- 
phabet and is omitted.] Assume that the relation is of rank 2 and find a true 
representation for it. 


v àA O 1 0001 10 11 000 001 01O Oll 
Pe) Leet dv do zo va aA do ds 


Compare with Example 11. 


4. Consider the following initiated pseudomachine (M, 2) [X = {0} and is 
omitted. Y = (0, 1]] 


4 0 0 1000 
A"(0 —|0 —& 0,  4")-—|$ 0 0 0 
0 o0 4 $000 


m= ($ ds io) 
a. Show that rank (M, z) — 3. 
b. Show that 0 < p," (v) < 1 for all v € Y* 


c. Show that there exists a three state true initiated machine (M*, z*) such 
that p,M = p,-™’. 
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5.a. Show that for every relation p of rank n there exists another relation p’ 
such that p # p' but [pl,-: = [P a-i 

5.b. What can be said about rank p' apart from the problem of deciding 
representability? 

6. Prove: A relation p is representable by a pseudo-ISSM if and only if p is of 
finite rank. 

7. Find a relation p of rank n, representable by a true n-state ISSM (n chosen 
at convenience). 

8. Consider the following ISSM: X = (0, 1; Y = (0,15 S = {1, 2, 3, 4; z = 
(100 0), with the transitions from state to state deterministic as follows 


Present state Input Output Next state 
1, 2 0 0 2 
3, 4 0 1 3 
1, 3 1 0 1 
2, 4 1 1 4 


, 


Prove that the above ISSM represents a relation of rank 3, but no true ISSM 
with less than four states can represent it. 


9. Let p be a relation known to be of finite rank. Let r(k) denote the maximal 
rank of all compound sequence matrices P(k) for p with P(k) = [p(v,v,'|u,t2,')] 
where /(v,, u;, (vj, uj) < k. 

Prove: If r(k) = r(k + 1) = r(k + j) = m, then either rank p = m or rank 
p2m+2j 

10. Give the most efficient algorithm possible for finding rank p < n for a given 
relation p. 


OPEN PROBLEMS 


1. Given a recursively computable relation p, formulate a decision procedure 
for ascertaining whether p is of finite rank, or prove that the problem is not 
decidable. 


2. Given a pseudo-ISSM (M, z), does there exist a true ISSM (M*, z*) such 
that (M*, z*) = (M, z), rank M* = rank M, and the number of states of M 
equals that of M*? 

Formulate a decision procedure for this problem, or prove that it is not 
decidable. If a decision procedure exists, give an algorithm for constructing 
(M*, n*) whenever possible 

3. Same as 2, except the number of states of M* is not required to equal that 
of M. 
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4. Same as 3, except that it is not required that rank M = rank M*. 


5. Formulate a decision procedure for the following problem, or prove that it 
is not decidable: Given any pseudo ISSM (M, x) are all the values p,“(v|u) 
nonnegative? 


4. Bibliographical Notes 


Input-output relations and sequential functions were studied, in the determinis- 
tic case by Elgot and Mezei (1965), Gill (1966), Gray and Harrison (1966), 
Raney (1958), Tal (1966), and others. Derivates were introduced by Brzozow- 
ski (1964) for the deterministic case. The first subsection here is based on the 
work of Arbib (1967) with additions from Carlyle (1967), and the second and 
third subsections on the work of Carlyle (1963a, b, 1965, 1969). The above re- 
ferences also served as a source for some of the exercises. Additional references: 
Blackwell and Koopmans (1957), Booth (1965, 1966, 1967), Dharmadhikari 
(1963a, b, 1965, 1967), Fox (1959), Gilbert (1959) Page (1966). Recently, a 
connection between the theory of categories and that of input-output proba- 
bilistic relations was established by Heller (1965, 1967) and Depeyrot (1968). 
See also Depeyrot (1969a, b). 
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INTRODUCTION 


This chapter is devoted to the theory of nonhomogeneous Markov chains and 
related topics. Nonhomogeneous Markov chains and systems are studied from 
a mathematical point of view, with regard to asymptotic behavior, compositon 
(direct sum and product), and decomposition. The last part of this chapter 
investigates “word functions" induced by Markov chains and valued Markov 
systems. These functions are studied with regard to characterization, equiva- 
lence, and representability by an underlying Markov chain or system. The 
reader is refered to the Preliminary Section in this book for an introduction 
and for the basic definitions used (see also the bibliographical remarks at the 
end of the chapter). 


A. NONHOMOGENEOUS MAREOV CHAINS AND SYSTEMS 


1. Functionals over Stochastic Matrices 


The matrices to be considered in this subsection are countably infinite unless 
otherwise specified. 
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Definition 1.1: Given a stochastic matrix P = [p,,] and an arbitrary vector z = 
(zt) we define 


d(P) = sup sup [Pus — Pail, dz) = sup [Zn — Tl 


0(P) = sup sup » Pis — Pu) 


nis (ml J 


where Ín'] denotes a subsequence of the sequence of natural numbers (to be 
denoted by {n}). If P is a finite matrix, then “sup” is to be replaced by “max” 
and "inf" by “min.” 

Notation: If a is a real number then a^ = max(a, 0) and a^ = min(a, 0). 


Proposition 1.1: (P) = sup, 4 20 (Du; — Piri)” 
The proof is left as an exercise. 


Proposition 1.2: 0 < d(P) < ó(P) < 1. 
Proof: It is a trivial consequence of the definition that 0 < d(P). For any 
fixed j, i, and i, it is clear that 
(Pas — P) < sup 24 Pax — Par) = È (Par — Par)” 


But sup,, |, — Pryl = SUPa( Pny — Pag)* since the indexes i, and i, are in- 
terchangeable so that 


sup | Pa; — Pusl = SUP (pu; — Pas)” < Sup sup » (Pur — Pus) 
fits dis i ke {n’} 


iis {n} 


for any fixed j and therefore d(P) = sup, sup, ,|p.; — p;;| < O(P). Finally, for 
fixed i, and i, we have that 


x (D; m Pai)? < È Pa zr Zu < I 


and using Proposition 1.1 we get that (P) < 1. i 


Proposition 1.3: If P = [p,;] and Q = [qu] are stochastic matrices then (PQ) 
<6(P) 6(Q). 
Proof: Fix i, and i4. We show first that 


»» (Pax — Pax)” + Do (Par — Par)” = »» (Pak = Pir) = 2 Put — Pak = 1—1-— 0 
and, therefore 


»» (Pak — Pix) = -X (Par — Par)” (1) 


Denoting by $, summation over a subset of the set of natural numbers we 
have 
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x (3 (Pak — Pikes)” = H »» (Par — Paraki = »» (Par — Pur) 2 dki 
< Ł (Par — Pus)' SUP 2 Vk; 
+ X (Pak — Pax)” inf 2 dk; 
= Y (Par — pux)' (sup 3 an; — inf $/ ges) 
k k j k J 
where the indices involved in the summation })’ may depend on i, and i;. But 
sup 374,; — inf $7g,, = sup $7 (ins — aru) 
k j k j kyks j 
< sup x (Ars — ar) = CQ) 
which is independent on i, and i;. 
Thus, D(X (Pak — Pak) < Ekl Par — Pax)” 9(Q) so that 
&(PQ) = sup » (X Gu — P001 < sup [X (Par — Par)* &(9)] 
< à) sup » (Par — Pix)” = O(P)O(Q) M 


Definition 1.2: If č = (&, is an arbitrary vector and Pan arbitrary matrix, then 


6| = sup; |&:l, |P] = sup,,, [Puls Ill = X 16] provided that 7 |¢,| < o» and 
||€|| = co otherwise; ||P|| = sup, 7,|p,,| provided that $| p| < oo for all i, 
and ||P|| = oo otherwise. 


Proposition 1.4: Let P = (pj) be a stochastic matrix and let ¢ be a nonzero 
vector of the same dimension as P such that ||€|| co and }) £, = 0 [€ = ($)] 
then ||¢P|| < ||¢|| &(P). 


Proof: Define the vectors C! = (€,') and ¢? = (C?) as 


ét E-| 
uoce and Gh = 2737 


Then using an argument similar to the one used in proving formula (1) we have 
that both (! and (? are stochastic vectors and &' — C? = 2(€/||€]|). 

Let Q be a matrix such that its first row is C! all the other rows being equal 
to . Then 


26(QP) = 2 i È (6, — EP) = i | »» (Cr! — Cn Del 


again using the formula (1). 
By the definition of &! and &? we have that 


2 | 21 (¢,' — COP. rcm 22, | X pu 
E12 Tei 
Il£Pl 
S DIPIZ EE 
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Thus, ||£P||/]l£|| = 6(QP) < 6(Q) &(P) < Ó(P) by Propositions 1.2 and 1.3, so 
that ||CP|| < |G]Ó(P). E 


Corollary 1.5: If P is a matrix such that all its rows have the properties of the 
vector č in Proposition 1.4 and Q is a stochastic matrix then 


||PQ|| < ||P] &(Q) 


Corollary 1.6: If P and Q are stochastic matrices, then ||PQ — Q|| < 26(Q). In 
particular if z is a stochastic vector and p is row of Q, then ||zQ — pl] < 


26(Q). 


Proof: \|PQ — Qll = IKP — Doll x ||P — 18O) < (IPI + IIDA) = 
26(Q). [See Exercise 8 at the end of this section.] 


Definition 1.3: Given a stochasic matrix P = [p;;], y(P) is defined as 
y(P) = inf » min(p,;, Piri) 


Proposition 1.7: Let P be a stochastic matrix, then d(P) = 1 — (P). 
Proof: Denote 


y (P) = D min(p,;, Piri) and ó,,(P) = py (Pas — Pars)” 
then 
(D) = X (Pay — Pos)” = 3 (Pay — MIN(Pi,» Pas) 
=1-— x min(p,;, Pas) = 1 — Yi P) 


Therefore, 6(P) > 6,,,(P) = 1 — y,,(P), which implies that 6(P) > 1 — y(P). 
Similarly, 6,,(P) = 1 — Yaa(P) < 1 — y(P) which implies that 6(P) < 1 — 
y(P). Combining the two inequalities we have that 6(P) = 1 — 7(P). | 
Proposition 1.8: If P and Q are stochastic matrices and y a column vector 
such that |y;| < 1 for all i, then d(PQ) < d(P)d(Q) and d(Py) < 6(P)d(n). 


Proof: It suffices to prove the second inequality. Let i, and i, be two arbi- 
trary rows in P. Since }),|p;,; — Pij| < 2, we can find for any given e a number 
k, such that 


3 Pu pal < € 


J=kotl 


Let 4; = min,-,,4; and assume that Y, pisn; > DL p.11; then 
0 x int F 2a Pails = p> Paii — ipt 


and replacing Pas by 1 — */;,;,p,; and Paj by 1 — Yep; WE get 
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È pul; zm» È Pasi zz »m Pa; = Ni) ge p Pil; uu Ni) 


p (24; — Piss — 1t) 


Ko 
< »» (Paj — Puit; — Ni) + 2€ 


By the definition of n, all the terms of the form 4; — 7], are nonnegative in 
the above sum so that by omitting the terms such that p,; < p,; the sum is 
increased. Also (5; — 4;,) < d(y) with the result that 


p> Ping hi — 2 Pitt < x (Pas — p.j)* d(t) + 2€ < 6(P)d(y) + 2e 


Since € > 0 is arbitrarily small and i,,i, are arbitrary, the proposition fol- 
lows. | 


Proposition 1.9: If P and Q are stochastic matrices and y is a vector as in 
Proposition 1.8, then |Pr — n| < d(y) and |PQ — Q| < d(Q). 


Proof: The same method used in the proof of the previous proposition can 
be used here beginning with the inequality 0 < 35;p,;5; — do Ens; where 
€, is equal to 0 except for a unique, but arbitrary, entry which is equal to 1, 
and continuing the same way as in the previous proof. The details are left to 
the reader. | 
Example: Let P be the matrix 


e 
w= 
w 


aj 
n= 
aj 


1 0 1 

z El 
then ||P|| = 1 [this is true for any stochastic matrix]; |P| = 2; d(P) = the 
maximal distance between two elements in the same column = 4 [ = |pi; — 


pul] 9(P) = + [ = 3i Gu — 51)'] and y(P) = £ (= 2X: min(pi, p5)). 
The inequalities proved in this section are easily verified. 


EXERCISES 


1. Prove Proposition 1.1. 

2. Prove Proposition 1.9. 

3. Illustrate by examples all the inequalities proved in this section. 
4. If P is a finite stochastic matrix of order n, then 


a. d(P) > 1/nó(P). 
b. It is possible that d(P) < 1 and ó(P) = 1. 
c. d(P) = 0 if and only if 6(P) = 0. 

If P is an infinite stochastic matrix, then 
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d. For any e, there is P such that d(P) < € but ó(P) = 1. 

e. d(P) = 0 if and only if 6(P) = 0. 
5. Prove: If y(P) z 0 for a stochastic matrix P, then y(P) is not smaller than 
the minimal nonzero entry in P and is not smaller than the sum of the mini- 
mal elements in the columns of P. 


6. Prove that every stochastic matrix P can be expressed in the form P = E + 
Q where E is a stochastic constant matrix and ||Q|| < 2ô(P) 


7. Prove: If P is a constant stochastic matrix, then Ó(P) = d(P) = O[y(P) = 1]; 
if P is a degenerate nonconstant stochastic matrix, then 6(P) = d(P) = 1 [y(P) 
= 0]. 

8. Prove that the functionals “|| ||” and “| |” have the following properties: 
For any matrices P, Q and real number & it is true that ||P|| > 0, ||P + Q|| < 
IIPII + Hill, ||P]| = 0 if and only if P = 0, ||xP|| = |æ] ||P]| [defining 0- co = 
co], and similarly for “| |". 


9. Let P be a Markov matrix representing a given Markov process. Let t,, be 
the probability that the process will transite from both states i and j to some 
common consequent state in the first step. Prove that t,, > 0, for any two 
states i and j, if and only if y(P) >0. 


10. Prove that for arbitrary matrices A and B, 
|| AB]| < AIL ||B]| 


11. Let A,,..., A, and A,,..., A, be two sets of n matrices such that || 4; — 
All < €, for i = 1,2,..., then |[[z: 4; — IT: Al] < ne. 

12. Let P be a Markov matrix and let P, be the Markov matrix such that all 
its rows are equal to the i, row of P. Prove that 0(P) > 3||P — P;J| but for 
every € there is an index i, such that 0(P) < 3|| P — Pall + 4€. 


13. A double stochastic matrix is a stochastic matrix P = [p,,] such that both 
Spy = 1,i = 1,2,...and Dip, = 1,j = L,2..., ie, the sum of the en- 
tries in any column is also equal to 1. Prove: 

a. If P is double stochastic and ó(P) = 0, then P is of finite order, say n, 
and all the entries of P are equal to 1/n. 

b. The set of doubly stochastic matrices is closed under multiplication [since 
I is double stochastic this implies that the set of doubly stochastic matrices is 


a monoid.] 
c. If P is a double stochastic matrix of finite order n such that Ó(P) < 1, and 
E is a matrix all the entries of which are equal to 1/n, then 


lim||P" — E| 2 0 [lim P” = E] 


d. If P is a countable double stochastic matrix, then Ó(P) = 1 
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14. Consider the following Markov matrix 


f= 
paf á S p+4>0, pq20 


q l—q 
Prove that 
uq... D 
limp =|? +9 P*4 
vus q Pp 
pttq p+q 


15. Prove that sup (||€P||/||¢||) = 6(P) where č ranges over vectors such that 
|€|| < co and Xé, = 0. 


16. Prove that any vector € such that ||€|| < co and Y, č; = 0 can be express- 
ed in the form ¢ = 372, C; where the C, = (€,;) vectors have only two non- 
zero entries, J4 « oo, x Či Dx 0, and (él = 2. Id]. 


2. Nonhomogeneous Markov Chains 


The different functionals d, Ô, y introduced in the previous section provide, 
in a certain sense, a measure of the “distance” between two arbitrary rows of 
a given stochastic matrix. Thus if the matrix P is constant, then ó(P) = d(P) 
= 0 and »(P) = 1 [see Exercise 7 in the previous section]. These functionals 
will be used subsequently for studying the long-range behavior of Markov 
chains. As mentioned before, a nonhomogeneous Markov chain can be repre- 
sented by an infinite sequence of Markov matrices {P,},” such that the matrix 
P, represents the transition probabilities of the system from state to state at 
time ¢ = i. Let H,, be defined as the matrix 
n 
Hm = II RP 
t=m+1 

then the ij entry in Hm» is the probability that the system will enter the state 
j at time ¢ = n if it was at state i at time t = m. We shall now distinguish 
between two cases for the long-range behavior of a given Markov chain. 

Case 1: lim, ....6(H,,,) = 0, m —0,1,2.... In this case the chain is called 
weakly ergodic. 

Case 2: For any given m there exists a constant stochastic matrix Q such 
that lim, ....||Hnn — Q|| = 0 in this case the chain is called strongly ergodic. 

In addition to the two above distinctions, there may be other distinctions as 
well (e.g., the matrix Q in the second case may not be constant, or the limit—in 
both cases—may exist only for some m, but not for all m, etc.) but because of 
their restrictive nature those distinctions will not be considered here. We shall 
give now some characterizations of the above defined properties. 
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Theorem 2.1: A Markov chain is weakly ergodic if and only if there exists a 
subdivision of the chain into blocks of matrices (H;;,,] such that $77- ,y(H,,,.) 
diverges, [i, = 0]. 

Proof: The condition is sufficient, since $77., p(H,,,,,) diverges implies that 
for any jo, lim,» [[%-;,(1 — 7(H4,,)) = 0 and using Propositions 1.3, 1.2, 
and 1.7, we have that 


TT P) < ó( il Hy) x i ÒH pia) = IT (1 = Ai) 


where i; > m means that the product begins with the first index i, > m. 
Taking limits on both sides, we get that 


lim &ll P) < lim ó( Il Fists) T 0 
n-oo i=m N>% om 


with N = m + n. If lim, O([[7-, P) =0, m= 1,2,..., then by Proposi- 
tion 1.7, 


lim (TI P) = lim (1 — ST] P)) = (1 — lim 6] P)) = 1 


Let 0 < € < 1 be a small constant, then if follows from the above inequalties 
that a sequence of blocks H;,,, can be found such that y(H,,,,,,) > € so that 
27i (Hri) diverges. || 

Theorem 2.2: A given Markov chain is weakly ergodic if and only if for each 
m there is a sequence of constant Markov matrices Emn; such that 


lim||H,,, zd Ennl| =0 

Proof of sufficiency: Let € > 0 be an arbitrary small number and let i,, i, 
be two arbitrary indices. Let Hm, = [a;j], Emn = [e;;] and suppose that n is so 
big that ||H,,, — E,,|| < €. Then by (1) 

> (a, — a4)! —$ > la;; — aud < F x lay — eng + € — anl 
<È la, — el + x la; — e) «3e + ©) = € 
J 

[Cleary en; = eny for Emn is a constant matrix.] Since i, and i, are arbitrary, 


we have also that 
ó(H,,) = sup x (ang — a4)! SE 


Proof of necessity: Under the same notations as above, let n be so big 
that 0(H,,) < €/2. Let Emn be a matrix such that all its rows are equal to 
some row, say the i,th, of H,,. Then 


»» la, — e] = Dulas — an| = 20 (a4, — a.) < 20(H,,) <2(€/2) = € 


Since i,, i, and € are arbitrary we have that 
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|x — Eml] = SUP: 22,24; — eul < € | 
Theorem 2.3: Let {P} he a given Markov chain and let P, = E, + R, with E, 
a constant stochastic matrix. Then the given Markov chain is weakly ergodic 
if and only if lim,.... ||][[7 R; = 0. 
Proof: It follows from Exercise 4 in the preliminary section that P, E, = E, 
and E,P, is constant. Thus (P, — E)(P; — P,) = P,P, — E,P, and by induc- 
ion 


m 
1 i=m i=m+1 


where the second term on the right-hand side is constant. It thus follows that 
the condition of Theorem 2.3 implies the condition of Theorem 2.2 which 
implies weak ergodicity. On the other hand 


m+n 


pe-z Hal -| 
i=m i=m+1 


by Corollary 1.5 and, therefore, weak ergodicity implies the condition of 
Theorem 2.3. i 


Examples: 
1. Let {P} be a chain such there is € > 0 with y(P,) > € > 0 for all i [this 
condition will hold, for example, if all the entries in all the matrices P, are > 
€, or even if in every matrix P, there is a column such that all the entries in 
that column are > e], then the chain is weakly ergodic by Theorem 2.1. 


2. Let P, be a chain such that 


(Pn — E) TE P||< IPn — Ec TT Pd 


+1 


iii 00 1 
Py = $ $ 0i, Pa =|} £& 0 
0.0 1 iii 


one finds by straightforward computation that 7(H5, ,,,) = Y(P2n-1P2,) = 4 
[check the computation] the condition of the Theorem 2.1 holds true and the 
chain is weakly ergodic. 


3. In the definition of weakly, ergodic chains it is required that lim, „~ Ò(H mn) 
= 0 for m = 0, 1, 2... , that is, Ô(H mn) — 0 independently of m. This require- 
ment is intended to exclude cases in which the ergodicity of the chain is 
induced by finitely many matrices in the chain. Consider, e.g., the chain {P,}”., 
with 
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Clearly H,, = P, for all n and y(H,,) = 1. But lim,...6(H,,,) does not exist 
for m> 1. 

Theorems 2.1-2.3 characterize weak ergodicity of chains. The following 
theorem gives a characterization of strong ergodicity. It also confirms the 
intuitive feeling that strong ergodicity implies weak ergodicity. 


Theorem 2.4: A Markov chain {P} is strongly ergodic if and only if for every 
m there is a sequence of constant stochastic matrices {Emn} and a sequence of 
stochastic constant matrices (E,] such that (1) lim,..||H,,, — E,,|| = 0 and 
(2) lim, ...|| E,,, Ta mll = 0. 


Proof: If (1) and (2) hold true, then 


lim ||H mn — mll < lim (Emn P E, + lE. = E,|l) = 0. 


But if (1) and (2) hold true then E,, is independent on m. To prove this we 
note that P,, H,,, = H,, ,, and P, E, = En [see Exercise 4 in the preliminary 
section] so that, 
[Emi de Enl < ||En-1 AE m-1,nll + Pon Hmn n P, Enll + PES > E,,|| 
= IE, Bs m-1,nl! + || P GT, — E,,)\| 
< ||Em-: = Hm-i,ll + || H mn d mll 


by Exercise 1.10 [and ||P,,|| = 1]. Taking limits in both sides we get 


|Em—1 ies mall um lim, ..|| E, EE E,|i 
= lim, (J| E, i =. miall T |H mn = »ll) = 0. 


Thus (1) and (2) imply that the chain is strongly ergodic. Conversely, if the 
chain is strongly ergodic, then setting Q = Emn = Em for all m and n we have 
that (1) and (2) hold true. | 


Corollary 2.5: A strongly ergodic chain is also weakly ergodic. A weakly 
ergodic chain which satisfies (2) is strongly ergodic. 


Proof: Strong ergodicity implies the condition (1) in Theorem 2.4, which, by 
Theorem 2.2 implies weak ergodicity, Conversely, by Theorem 2.2 weak 
ergodicity implies (1) which together with (2) implies strong ergodicity by 
Theorem 2.4. I 


Corollary 2.6: Conditions (1) and (2) in Theorem 2.4 and Corollary 2.5 can 
be replaced by the condition (2’) lim, ...||H,,, — E,|| = 9. 


Proof: || Hmn P Enl] < || inn om Emnll F ||Emn n a 0. Conversely (2’) 
implies (1) and (2) with Ema = E, for all m and n. 
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Corollary 2.7: Condition (2) in Theorem 2.4 and Corollary 2.5 can be replaced 
by the condition: there is a constant stochastic matrix E such that (2") lim, ... 
||EH,, — E|| = 0. 
Proof: || EH mn a E|| || EH,,, Pn H mall + Hoan m E|| :20(H,,) + iH, 

E|| by Corollary 1.6. Condition (1) of Theorem 2.4 implies that 6(H,,,) > 0 
and condition (2) in that theorem implies that ||H,,, — E,|| = ||H,,, — E|| 0 
[Z,, is independent on m as proved in the proof of that theorem]. Thus condi- 
tion (2") holds with E — E,. Conversely if (1) and (2") hold true, then let 
E,, — E. It follows that 


[lEn e E,,|| = IE, aa H mnll + | Hmn kgs H mnl! + || EH mn IE E|| —0 
by (1), (2^), and the fact that 0((H,,) 20. fj 


Theorem 2.8: Let {P} and (P) be two Markov chains such that Ir — P| 
< œ then, for any € > 0, there i is an integer m, such that ||H,,, — Aml] < e, 
for all m > m, and all n > m, [H,, is the product of Bs Ger Domin E to 
Hmn]. 


Proof: Let P, — P, = E, with ||E,|| = e, then H,, = [](P, + E) = H,, + 
Rmn where R,,, contains all possible products of P, and E, matrices. Using the 
facts that ||E,|| = e, is finite for all i, ||P.|| = 1 for all i [P, is stochastic] and 
||AB|| < ||A]| ]|B]| for any two matrices 4 and B [see Exercise 1.10] we have 
that 


Rall < 3 e; + dee, T » eee, d Al, e, — Al, (T e)—1 


Note that the e,s are nonnegative. Now as >) e, < co, the product JẸ? m+ 
(1 + ej) converges and, therefore, for any e, there is m with |R,,|] < €. The 
theorem is thus proved. | 


Corollary 2.9: Let {P} and (P) be two Markov chains satisfying the conditions 
of Theorem 2.8. If one of the chains is weakly ergodic, then so is the other. 


Proof: Assume that lim,....6(,,,) = 0, m = 0,1... . Let Cmn be a matrix 
all the rows of which are equal to some row say i, of H,,, and Let C,, be a 
matrix all the rows of which are equal to the corresponding i; row of Hn». 


Linn n2 Call < |Ë an m H,,l F |H mn = Coil t IC... am Cl 


and for any i, ||C,,, — C,,|| < ||, — H,,,||, since the rows of Cmn are equal 
to the i, row of Hmn and the rows of C,, are equal to the i, row of Hn, by 
definition. Moreover, by Exercise 1.12 ||H,, — C,,||  20(H,,,) and by 
Theorem 2.8 one can choose an m such that ||,, — H,,|| is as small as 
wanted. Let us now combine the above arguments together. Given €, choose 
ig to be an integer such that Òm.) < $||Hnn — C,,|| + €/2, this is possible 
by Exercise 1.12. Choose my so that ||H,,, — E,,|| < €/3. For any m > m, 
there is n such that Ô(H ma) < €/6 thus, for the fixed i, and for any m — mp 
there is n such that 
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7 A € 2€ €... 
lAn — Cell <= + | +z = € 
and 
5( Bhan) < B, — mll +$ e 


Finally, for i < mm, we have that 5( Fin) < OF, m-1) 5A non) < OCA myn) so that 
lim,..6(H,,) —0. fj 


EXERCISES 


1. Prove that a Markov chain (Pj such that >> y(P,) diverges is weakly 
ergodic. 

2. Prove that if a Markov chain {P,} is weakly ergodic then every convergent 
subsequence of the sequence Hm, (for fixed m) converges to a constant matrix. 
[4; converges to A means that ||4; — A|] — 0.] 


3. Let (P) be a Markov chain. Prove: If there exists a vector z such that 
lim, ..0||#4 nn — %|| = 0, then also lim,,....||zP,, — || = 0. 

4. Let {P} be a Markov chain such that there is € > 0 with »(P,) > € for all 
i, and let (P) be an arbitrary Markov chain. Prove that there is a constant 
stochastic matrix S such that 


lim||P, P, P. E afer P,P, — || = 0 
Generalize this result. 


The following exercises (5-11) deal with the distinction between finite and 
countable Markov chains. 


5. Prove by an example that for any € there is a countable Markov matrix P, 
such that d(P.) < € but Ó(P.) = 1. 

6. Prove that a finite Markov chain is weakly ergodic if and only if lim..... 
d(H,,)- 0. Is the above statement true for countable Markov chains? 
Explain. 

7. Prove that a finite Markov chain is strongly ergodic if and only if there is 
a constant stochastic matrix Q such that lim,..|H,,, — Q| = 0. Is the above 
statement true for the countable case? Explain. 


8. Prove that if the Markov chain in Theorem 2.2 is finite, then the condition 
of that theorem can replaced by the condition: lim, ..|H,,, — E,,| = 0. Discuss 
the countable case. 

9. Same as Exercise 8 but for Theorem 2.3 with the condition replaced by the 
condition that lim, ...|[ [77 R,| = 0. 
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10. Same as Exercise 8 but for Theorem 2.4 with the conditions replaced by 
the conditions 
(1) lim |H,,, — E,,| = 0 


11. Prove that all the other theorems in this section can be replaced by similar 
theorems with the norm “| |" replacing the norm “|| ||" whenever it occurs 
and discuss the countable case. 

12. Markov chains in general can be classified according to the following four 
types: 


Type IH, — Qril > 0 ÒH mn) ^ 0 
Strongly ergodic Yes Yes 
Weakly ergodic No Yes 
Convergent Yes No 
Oscillating No No 


where ||H,,, — Q,|| — 0 means that for any m, there is a matrix Q,, (not neces- 
sarily constant) such that lim, ...|| Hmn — Q,,|| = 0. In Corollary 2.9 it is proved 
that if two chains satisfy the conditions of Theorem 2.8 and one of them is 
weakly ergodic then so is the other. Prove that the same is true for all the 
other three types of chains above. 


13. Prove that if all the matrices in the Markov chain are equal one to the 
other (the chain is homogeneous), then weak ergodicity implies strong ergo- 
dicity. 

14. Prove that if all the matrices in a Markov chain are doubly stochastic, then 
weak ergodicity implies that the matrices are of finite order and implies strong 
ergodicity. 


3. Nonhomogeneous Markov Systems 


The difference between Markov systems, to be introduced in this section, 
and Markov chains, discussed in the previous section, is that in the Markov 
system model one studies the set of all possible products of Markov matrices 
taken from a (finite) given set of such matrices, while in the Markov chain 
model one investigates a specific given infinite product of Markov matrices 
and its possible subproducts. The approach in this section is closer to the auto- 
maton concept where the set of all words over a given alphabet is studied with 
regard to the transitions induced on the states of the automaton by the different 
words. 
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The words correspond here to products of Morkov matrices which induce a 
probabilistic transition between the states of the automaton. 

It is to be noted, however, that a homogeneous Markov chain is a particular 
case of both a nonhomogeneous Markov chain and a Markov system—the case 
where only one Markov matrix and its powers is considered. 


Definition 3.1: A Markov system over a [finite] alphabet È is a pair (S, (4(o)1) 
where S is a [at most countable] set of states and {A(a)} is a set of Markov 
matrices [representing the transitions between the states] such that the matrix 
A(a) is associated to the symbol o € X. 


Notation: If x is a word in E* (the set of all words over X including the 
empty word denoted by A) such that x = c, :-- o, then A(x) = A(o,)A(o2) 
--+ A(O); A(x) = [a;(x)] and a,x) is the transition probability from state i 
to state j associated with the word x. 

It will be assumed that the alphabet È is finite. We shall, however, mention 
later some of the implications induced by the assumption that Z is infinite. 
When two systems are compared it is always assumed that they are over the 
same È. 


Definition 3.2: A Markov system (S, {A(a)}) is weakly ergodic if for any 
€ > 0, there is an integer n = n(€) such that ó(A(x)) < € for all words x such 
that I(x) > n(€) where I(x) denotes the length of the word x. 


Remark: If a Markov system is weakly ergodic, then 6(A(x)) — 0 uniformly, 
the magnitude of .6(A(x)) depending only on the length of x and not on the 
specific symbols contained in x. Such a requirement of uniformity will be too 
restrictive for the strong ergodicity and therefore strong ergodicity will not be 
dealt with for Markov systems. 

Note that A(xy) = A(x)A(y) so that ó(A(xy)) < 6(A(x))6(A(y)) < ó(A(x)) 
and therefore if and only if 6(A(x))<€ for all x with I(x) n(e), then 
ó(A(x)) < € for all x with I(x) > n(€). 


Theorem 3.1: A Markov system is weakly ergodic if and only if there is an 
integer k such that 6(A(x)) < 1 for all x with (x) = k. 


Proof: Necessity follows directly from the definition. To prove sufficiency 
set Ô = maxy,)-, O(A(x)) < 1. [there are only finitely many words x with 
Kx) = k because = is finite.] Let n be an integer such that 6” < € for a given 
€ > 0. Let x be a word such that x) > kn, then x = y, --- Yn y where 
Ky) = +++ = Ky.) = k and Ky) > 0. Thus, 6(4(x)) < A) - - + 964) 
<6" — e. It follows that the system is weakly ergodic. I 


Remark: The theorem will remain true even if the alphabet X is infinite 


provided the requirement that d(A(x)) < 1 is replaced by the requirement that 
there is a real number 6 < 1 such that ó(A(x)) < ô for all x with I(x) > k. 


Theorem 3.2: Let (S, {A(a)}) and (S, {A(a)}) be two systems such that the first 
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is weakly ergodic and the second is arbitrary. There is € > 0 such that if 
l|4(c) — A(a)|| <€ for all o € E, then the second system is also weakly 
ergodic. 

Proof: Using Theorem 3.1, we must prove that there is € such that if 
|A) — A(a)|| < € for all o € X, then there is n such that ó(A(x)) < 1 for 
all x with I(x) > n. Let A,,(x) be the matrix such that all its rows are equal to 
the i, row of A(x), then || A(x) — A,,(x)|| < 26(A(x)) by Exercise 1.12. As the 
first system is weakly ergodic, there is n, such that Ó(A(x)) < 4 for all x with 
I(x) > ny, i.e., || A(x) — AjG)|| < 4 for all such x and any iy. Let X be a fixed 
but arbitrary word with /(x) = n, and choose i, so that ó(A(x)) < 4||A(%) — 
A,{%)|| + 4. Such an i, exists by Exercise 1.12. Finally, let € be a number 
0 < € < 1/(3m), and let || 4(c) — A(a)|| < € for alla € X. Then, by Exercise 
1.11, we have that || 4(x) — A(X)|| < 4 (for (x) = n). Thus, 

SAE) < & + MIAG) — AGI 

«4 AG) — ADI + AG) — ADH 14469 — AI 
<+ +++] 
since ||4,(x) — AK) < ||A(x) — A(X)|| < 4. But x is arbitrary and therefore 
we have that 6(A(x)) < 1 for all x with (x) = m provided that ||4(c) — A(o)|| 
< 1/(3n) for all o € E, where n, is an integer such that 6(A(x)) < 4 for all x 
with /(x) > n. To complete the proof we note that if 6(A(x)) < 1 for all x 


with I(x) = no, then this is true also for all x with J(x) > m, as mentioned 
before. | 


Theorem 3.3: Let (S, {A(a)}) and (S, (A(0)]) be two systems such that the first 
system is weakly ergodic. For any ô > 0, there is € > 0 such that if ||A(o) — 
A(a)|| < for all o € È then || A(x) — A(x)|| < ô for all x e X*. 


Proof: By the previous theorem, there is e, such that ||4(a) — A(o)|| < €, 
for allo € È, implies that both systems are weakly ergodic. Thus there is €, 
such that there is n, with both ó(4(x)) < 6/6 and 5(A(x)) < 6/6 for all x with 
I(x) > m and the given ô provided that ||4(c) — A(o)|| < €,. For the number 
ny above, there is €, such that if || Ao) — A(o)|| < €; then ||A(x) — A(x)|| < 
6/3 for all x with /(x) < n, [this follows from Exercise 1.11]. Let € = 
min (€,,€). Then for all x with Mx) < m, ||A(x) — A(x)|| < 6/3 < 6. If 
x = yz with /(z) = n, and I(y) > 0 i.e., if I(x) > m then, using Corollary 1.6 
we have that 


| A(x) — ADI] = ||A(yz) — Az) 
< ||AQ)A(z) — A)| + AAE) — AI! + [LA — AGI 
< 26(A(z)) + 26(A(z)) + || A(z) — AC)II 


ô ð 6 _ 
<5+$+5-6 I 
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Remarks: Theorem 3.3 provides an interesting application: Assume that a 
system (S, (.A(c)]) is given together with an initial distribution z over the states, 
and it is required to compute the values of the vector xA(x) for some word 
x 0,::: 0,. If the number of states in countably infinite, then it will be 
impossible to compute the exact values of the entries of zA(x). If the system is 
weakly ergodic, then using Theorem 3.3 one can change the vector z into a 
new vector z such that ||z — z|| « € and z has only finitely many nonzero 
entries. The rows of A4(c,) corresponding to zero values in Z can be replaced 
by zero rows and one can choose finitely many columns in the remaining rows 
so that by replacing the other columns by zero columns one gets a matrix A(¢;) 
such that ||4(c,) — A(o,)|| <€ and A(o,) has only finitely many nonzero 
entries. The process is repeated for 4(0;)--- A(o,). As ||z — z|| <€ and 
|A) — A(o,)|| < e, ImA(x) — zA(O|| < 6 with e a function of ô. An infinite 
computation can thus be replaced by a finite computation and the resulting 
error can be kept under control. Theorem 3.3 may also be used for rounding 
off the entries in the individual matrices A(a) [in order to simplify the com- 
putation, or to make computation possible when the entries are irrational] and 
keeping the resulting error in long computations under control. 

Because of the importance of Theorem 3.3, one is induced to ask whether 
the condition of that theorem is best [i.e., whether it is also a necessary con- 
dition for the theorem to hold true]. That this is not the case is shown by 
Exercise 3.4. On the other hand it is clear that the theorem is not true in 
general, e.g., let J be the unit matrix of order n and let P be any double 
stochastic matrix such that ||P — I|| < € and such that 6(P) < 1, then, inde- 
pendently on € we have that lim, .., P" = E where E is a matrix such that all 
its entries are equal to 1/n [see Exercise 1.13c]. Thus, for large enough m and 
n> 2, 


—2 
IZ — Pell > ll — El — |B — P"l| = 1+ ——  — IE — Pr > 1 


independently of e. 

One additional question with regard to Theorem 3.3 to be considered here 
is the following: Assume that we drop the requirement that the first system is 
ergodic and require instead that (ø) has zero entries in the same places where 
A(c) has zero entries [i.e., no new transitions are added in the approximation]. 
Is this new condition necessary or sufficient, or both, for the theorem to hold 
true? It is clear that this new condition is not necessary, since a weakly 
ergodic system may have zero entries in its A(o) matrices and the theorem does 
not impose any restrictions on the corresponding entries in the matrices Alo). 
The following example will show that the above condition is not sufficient 
either. 
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Example: Let 


pi-po 0 1—-pO p 0 
0 1 0 0 0 0 0 
Al) = 0 0 q icr Al) = 0 1-40 
0 0 0 1 0 0 1 0 


then, by straightforward computation one finds that 


pP i-po o0 
0 1 0 0 


A(0,") = 0 0 "E. 
0 0 0 1 
[1— p" o p 0 
Kee = 1 0 0 0 
qr! 0 1—q" 0 
| 0 0 I 0 


Denote 9,"0; by x,. It is easily seen that if words of the form x, only are 
considered, then the subsystem consisting of the first and third states is inde- 
pendent of the other two states, i.e., if A'(x,) denotes the submatrix of A(x,) 
corresponding to the first and third state 


1 uL *1 pti 

A'(x,) = | ec 1 fe zn 

then 4'(x,x,) = A'(x,)A'(x,,). Let now p = q = 1. In this case 
1— Jl 
nti 4n*1 
AG) =| ^ 
1 TE 
qu qv 


A'(x,) being doubly stochastic, we have that [see Exercise 1.13c] 
à $ 

lim 4'(") = h | 

me 2 2 


But if p = į and q = i — € with 0 < € < 1, then the matrix A’(x,) will have 
the form 
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1 nti 1 
( = ) qn 
] a 1 1 ntl 1 
. G D nti ( = ) Tani 
lim A'(x,") = 1 nt h i ] : 
(s - 9 gu 


CE Ne od! Pie cc 
GT ue Ga ae 


And for any € > 0 and 6 > 0, there is n with 


n+l 

eniras 
(let the reader prove this fact). Let B(c;) be the matrix 4(c;) when p = q = i 
and let B(o,) be the matrix A(o,) when p— 14andg~i—6€,0<€ <4. 
Then ||B(o,) — B(c;)|| < 2e, but for any such e, there is a word of the form 
x," such that the 1, 1 entry of B(x,") is bigger than $ and the corresponding 
entry in B(x,”) is smaller than 4 so that ||B(x,") — B(x,”)|| > 4 which shows 
that the consequence of Theorem 3.3 is not true for this example although 
B(c;) has nonzero entries in the same places as B(o,). 


EXERCISES 


1. Discuss Theorems 3.2 and 3.3, in the case where X is infinite. 


2. Show that if all the Markov systems considered are finite then all the 
theorems of this section are true with the norm “|| ||" replaced by the norm 
*| |" and ô replaced by d. 


3. Prove that any system (S, ((c)]) such that |S| = 2 is weakly ergodic if and 
only if the matrices lo 1| and Ki jJ are not included in the set {A(o)}. 


4*. Prove the following proposition: Let (S, (4(c)]) and (S, (4(o)]) be two 
systems such that |S| = 2. For any ó > 0, there is € > 0 such that if 


1. ||A(o) — A(o)|| < € for all o € E with 


1 0 0 1 
Alo) # lo ‘| and A(a) # E a 
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1 0 0 1 - 
2. If, Ao) = | 0 or A(a) = | 1 of then A(c) = A(o). 


Then || A(x) — A(x)|] < ô. 


5. Prove that a weakly ergodic system such that all its matrices are double 
stochastic, has only finitely many states, is strongly ergodic and the limiting 
matrix is such that all its entries are equal. [A system is strongly ergodic if 
lim... ||4(x) — Q|| = 0 for some constant stochastic matrix Q.] 


4. Graph Properties and Decision Problems 


Up to this section no restriction was assumed with regard to the finiteness or 
infiniteness of the Markov chains or systems considered. In this section, how- 
ever, we shall assume that the chains or systems have only finitely many states. 
This restriction will enable us to simplify the classification of the states of a 
chain. In addition we shall be able to prove some decidability theorems under 
the finiteness restriction although it is not known whether these theorems are 
true in the infinite case. Some of the difficulties encountered in this case will 
be illustrated in the exercises following this section. For more information on 
infinite homogeneous Markov chains, the reader is referred to the books by 
Kemeny, Snell, and Knapp (1966), and Feller (1958). 

Given a Markov matrix P = [p,,] with state set S, the graph associated with 
P is a pair (S, I) where I is a binary relation on S (T € S x S) such that 
(i, j) € S if and only if p,, > 0. If i € S, then iT denotes the set of states 

ir —(j: (5) €T) 

A sequence of states (io, i, .. . , i,) isa path of length n if every pair of adjacent 
states in the sequence is in I. Then state j is a consequent of length n of i if 
there is a path of length n beginning with i and ending with j. A pair of states 
have a common consequent (of order n) if there is an integer n such that 
if" > jI" + @ where I” means the composition of I with itself n times 
[G, j) € Y? if and only if there is k such that (i, k) € T and (k, j) € T.] The 
graph is strongly connected if there is a path connecting any pair of states. 

We are now able to classify the states of a given graph (S,I). A state is 
called transient if it has a consequent of which it is not itself a consequent. A 
state which is not transient is nontransient. 


Remarks 
1. It is decidable whether a given state is transient or not [see Exercise 4.1]. 


2. There must be nontransient states in any graph. Otherwise, one can con- 
struct an infinite sequence of states i, i, . . . ix, . . . such that for k > j,i, isa 
consequent of i, and i, is not a consequent of i, (the relation of consequence is 
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transitive). All the states in the sequence must therefore differ one from the 
other, and, as the chain is finite, this is impossible. 


3. If i is a nontransient state and j is a consequent of i, then j is also non- 
transient. For let k be any consequent of j, then k is a consequent of i which 
implies that i is a consequent of k [i is nontransient] which implies that j isa 
consequent of k [since j is a consequent of i]. 


The set of nontransient states is divided into ergodic classes, where an ergodic 
class is a maximal strongly connected set of states and two states belong to the 
same ergodic class if and only if they are consequents of each other. In order 
to be able to proceed with the classification we need now the following: 


Lemma: A set of positive integers that is closed under addition contains all 
sufficiently large multiples of its greatest common divisor. 


Proof: Let d be the gcd of the given set of numbers, then there is a finite 
set of these numbers, say 71, 75, . . . , My, Such that d is their gcd [Let n, be the 
first number in the set. If n, = d, we are done, if n, > d, then there is an m, 
such that the gcd of (7, m) = d, is >d. If d, = d, we are done. If d; #4, 
we continue the process getting a sequence of numbers 71, M, rs, ... which 
must terminate as the d,s are decreasing.] By a well-known theorem of 
arithmetic, there are integers [negative or positive] a,,a;,...,a, such that 
a,n, + --- + apn, = d. Let m be the positive part and let n be the absolute 
value of the negative part in the left-hand side of the above equation. Then m 
and n are numbers in the given set [for the set is closed under addition]. 

Let q be any number, then q can be written in the form q — an 4- b with 
b< n — 1. Multiplying by d we get dg = dan + db. But d = m — n so that 
db = (m — n)b and dq = dan + (m — n)b = (da — b)n + bm. Thus for any 
q such that a > (n — 1)/d the value da — b will be nonnegative with the result 
that dq is in the set. The lemma is thus proved. | 


Let E be an ergodic class of states, let i and j be two states in E and let N,; 
be the set of integers n,; such that there is a path of length rj; connecting the 
two states i and j. The sets N;, are not empty by the definition of E. Consider 
now the two sets of integers N; and N,, and let d; and d; be their gcd respec- 
tively. By the previous lemma we have that for sufficiently large k, kd; € N;, 
[since the sets N,, are clearly closed under addition]. Let a € N; and c € Ny 
be two integers, then for sufficiently large k, a + kd; +c € Nu It follows 
that d, divides a + kd, + c, and, since d, also divides (a + c) € Na, we have 
that d, also divides kd; for all sufficiently large k. But this is possible only if d, 
divides d,. Similarly d, divides d, or d, = d;. The consequence is that all the 
sets N; have the same gcd to be denoted by d. Let a and b be integers in N;; 
and let c bein N,. Then a+c € N, and b+c e N; so that a + c= 
0 (mod d) and b + c = 0 (mod d) or a = b (mod d). It follows that all the 
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integers in N;, are congruent to each other (mod d) and in particular to the 
smallest integer in N,,, to be denoted by ¢,,. [If i = j, then t; 4 0 and if i = j 
we may define ¢,, = 0, for any integer in N; is congruent to 0 (mod d).] 

We are now able to divide any ergodic class E into periodic subclasses as 
follows: Two states i and j in E are in the same periodic class if and only if 
j € iT” and n= 0 (mod d). It is easy to see that the relation of being in the 
same periodic class is an equivalence relation (see Exercise 4.3), and any ergodic 
class is thus subdivided into exactly d periodic subclasses [d is the gcd of the 
sets N,,] Ci, C4... , C; where any path connecting a state in C; to a state in 
C;, i <j has length n with n = j — i (mod d) and j — i = t, 


Example: Consider the graph in Figure 12. The states 1, 2, 3 are transient. 
The set of states (6, 7, 10} is an ergodic class and the sets {6}, {7}, {10} are its 
periodic subclasses with d — 3. The set of states (4, 5, 8, 9] is another ergodic 


| X 


Figure 12. Schematic representation of a transition graph. 


class and the sets {4, 8} and {5, 9} are its periodic subclasses with d = 2. Note 
that, by a proper rearrangement of the states, the matrix whose graph is as 
above can be written in the form shown in Figure 13 (nonzero entries are rep- 
resented by a x sign). Thus every ergodic (E,, E;) class is represented in a 
square main diagonal submatrix with all the entries in the remaining parts of 
the corresponding rows being zero. The periodic subclasses are represented in 
[not necessarily] square submatrices filling the intersection of a set of rows 
corresponding to one periodic class with a set of columns corresponding to 
another periodic class in the same ergodic class. All the other entries in the 
corresponding rows are zero. 

The rearrangement of states, illustrated above, is possible in general and any 
stochastic matrix can be rearranged so as to have the above form. 


Definition 4.1: A matrix is SIA (stochastic, indecomposable, and aperiodic) if 
it is stochastic and its graph has only one ergodic class with period d = 1 [i.e., 
there are no periodic subclasses]. 
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4 8 5 9 6 7 — 39s ub 2 3 


Figure 13. Canonical representation of stochastic matrix. 


Definition 4.2: A stochastic matrix satisfies the condition H, if every pair of 
states in the associated graph has a common consequent. 


Lemma 4.1: A stochastic matrix is SIA if and only if it satisfies the condition 
H.. 


Proof: Let P be an SIA matrix of order n, and let i and j be two of its 
states. There is m (<n — 1) such that both i and j have consequent states i’ 
and j' of order m and ï’, j' are nontransient. Since there is only one ergodic set 
which is not periodic, there is an m, such that m, € N,, and m, € Npp so that 
i' is a common consequent of order m 4- m, of both i and j. 

Assume now that P satisfies H, and that there are several ergodic classes in 
the graph G,, G,,...,G,. Let i, € G, and i, € G; be a pair of vertices in dif- 
ferent classes, then i, and i, have a common consequent k which is nontransient 
(k is a consequent of nontransient states). Hence i, and i, are consequents of k 
which implies that k € G, and k € G, or G, = G,. It follows that there is a 
single ergodic class in the graph. Assume that the ergodic class is divisible into 
several periodic subclasses C}, . . . , C, and let i, and i; bea pair of vertices in 
different classes C, and C, respectively. Then i, and i, have a common con- 
sequent k which is nontransient and belongs, therefore, to a periodic class C,. 
Then k is a consequent of order k — 1 (mod d) of i, and a consequent of order 
k — j (mod d) of i, and [since k is a common consequent of both i, and i;] 
k —1=k—j(modd) or 1 &j (modd) or C, = C,. Thus there is no 
periodic subdivision of the ergodic class and the proof is complete. || 


A. Nonhomogeneous Markov Chains and Systems 89 


Lemma 4.2: Let (S, D) be a graph with n states. If a pair of states i and 
j. ij € S, has a common consequent, then it has a common consequent of 
order v where v < n(n — 1)/2. 


Proof: If states i and j have a common consequent, then there exists a se- 
quence of (unordered) pairs of states [with i = ip, j = jo] 


(ioio), (ii), ..e.9 (,i,) 


such that (1) i; = ją, k = 0,1,2,...,u — 1; (2)i e iT*, j, e jT*; (3) i, = j,. 

If the sequence contains two equal pairs, then omit the part of the sequence 
between these pairs, including the second of the equal pairs. Repeat this pro- 
cedure until a reduced sequence is obtained 


(into), (ji)... Gu s GH) 
such that (1) iz jẹ, k = 0,1,...,0 — 1; (2) i,’ e iT*, jẹ e JT; (3) 
(4 jx’) x (i/ jj), k Fj, k,j = 0, 1, 2, eO, V. (4) i, S Je 
Now by (2") and (4^), i,’ = j,’ is a common consequent of order v of the 
states i and j, while by (1’) and (3^), v is at most n(n — 1)/2. i 


Remark: It is not known whether the bound given in Lemma 4.2 is sharp. 
It can be shown however that the difference between the above bound and any 
sharper bound is of the order of magnitude n/2 where n = |S| [see Exercise 
4.4]. 


Definition 4.3: A stochastic matrix is called scrambling if every pair of states 
in the associated graph has a common consequent of order 1. 


Lemma 4.3: Let P be a finite stochastic matrix, y(P) > 0 if and only if P is 
scrambling. 


Proof: min, a 0; min (Paj, Paj) > O if and only if for any i, and i, there 
is a j with both p,,, and p,; > 0. 


Theorem 4.4: Let P be a finite stochastic matrix. P satisfies H, if and only if 
there is an integer k < n(n — 1)/2 such that y(P^) > 0. 


Proof: If P satisfies H, then, by Lemma 4.2 there isa k < n(n — 1)/2 such 
that P* is scrambling [the common consequent property is hereditary, i.e., if 
two states have a common consequent of order n then they have a common 
consequent of order —7] so that by Lemma 4.3 y(P*) > 0. If there is k with 
y(P*) > 0, then P* is scrambling, i.e., P satisfies H,. Wd 
Corollary 4.5: It is decidable whether a finite homogeneous Markov chain is 
ergodic or not. 


Definition 4.4: A stochastic system (S, {A(o)}) satisfies condition H, (of order 
k) if there is an integer k such that all the matrices A(x) with I(x) > k are 
scrambling. 
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Corollary 4.6: A stochastic system is weakly ergodic if and only if it satisfies 
condition H,. 


Proof: By Lemma 4.3, Proposition 1.7, Theorem 3.1, and Definition 4.4. Jj 


Remarks: 

a. If all the matrices A(c) are equal one to the other (the homogeneous case) 
then the condition H, reduces to the condition H,. 

b. It suffices that the matrices A(x) with I(x) = k be scrambling for the 
condition H, to be satisfied [see Exercise 4.6]. 


Theorem 4.7: If a stochastic system (S, {A(o)}) satisfies the H, condition, then 
it satisfies this condition of order k with 


k < (3 — 27 + 1) 
where n = |S]. 
Proof: Assume that there is a matrix A(x) with I(x) > 4(3" — 2"*! + 1) and 
A(x) is not scrambling. Then there are two states i, and i, which do not have 


a common consequent by A(x). Let x = 9, --- c, and consider the following 
sequence of unordered pairs of sets of states 
(à! , Oo"), (04, 04,7), LEE) (a,', a?) 
where 0)! = i,, & = i; and 
0.4, Ari 
are the consequents of the states in 0, æ? respectively by the matrix A(o;). 

By the definition of the matrix A(x) and of the as, we have that all as are 
nonvoid sets and every pair of as is a disjoint pair of sets. Let &? denote the 
set of states in S which are not in à! U &?. There are 3" different partitions of 
S into 3 disjoint subsets a1, &?, «7, but 2"*! — 1 of these have a or œ? or both 
empty. [There are 2" partitions of S into two sets &,' and a,’ or æ? and a, but 
the partition with S = a? is counted in both cases.] Thus there are 3" — 2"*! 
+ 1 ordered partitions («,', a7, 0,3) of S such that both a! and «? are not 
empty. If the order between «,' and a is not taken into account then the 
number of such partitions reduces to $(3" — 2"*! + 1). This argument implies 
that there are two equal pairs in the above sequence say (a, @,7) = (@,', 0), 
j«k « n. It follows that any matrix of the form 


AlO, +++ 0,4 (6; +++ 6,1) rp—1,2,... 
is not scrambling and the condition H, is not satisfied. i 


Corollary 4.8: It is decidable whether a given stochastic system satisfies the H, 
condition. 


Proof: By Lemma 4.3, Theorem 4.7, and Definition 4.4. | 
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Remark: To decide that a given system does not satisfy H, one must check 
all matrices A(x) with I(x) < 4(3" — 2"*! + 1) which will make the procedure 
difficult and for large n, even impracticable. One may facilitate the computa- 
tion by disregarding any matrix A(x).which has a scrambling matrix as a factor, 
for any such matrix is a priori scrambling [see Exercise 4.6]. On the other hand 
it is shown in the following example that the bound of Theorem 4.7 is sharp 
and cannot be improved in general. 


Theorem 4.9: The bound in Theorem 4.7 is sharp. 
Proof: Fix n, \et K be a set of n states and let the following sequence by any 


enumeration of all different unordered pairs, of nonvoid disjoint sets of states 
from K: 

(a', a°), (05, 0&2), ...3 (0, a) (2) 
such that the number of states in any set of the form a, = a@,' U @,? is not 
smaller than in the set &, , for i = 1,2,...,Kk. As stated before k + 1 = 
iQ" — 27! + 1). 

If g is a set of states and A(x) is a matrix in a system, denote by A(x, 9) the 
set of states which are consequents of those ing by A(x). Let (K, {A(o)}) be a 
system such that |Z| = k and the matrices 4(c;), . . . , A(c,) satisfy the follow- 
ing property: 
if ọ cal, 


Qi 
A(a,, 9) = as icem (3) 
LA otherwise 


Note that the number of states in A(o,, 9) can be smaller than in 9 only in the 
second or third case in (3). This follows from the definition of sequence (2), 
and we shall refer to this property as the conditional monotone property. Note 
also that if (3) is satisfied for one-element sets, it is satisfied for any sets. 

We will show now that the stochastic system as defined above satisfies the 
H, condition, but there is a word x € &* with /(x) = k such that A(x) is not 


scrambling. 

The second assertion follows from the fact that the matrix A(x) = 
A(G,,02,...,0,) is not scrambling by the definition of the sequence (2) and 
by (3). 


To prove the first assertion, assume that there is a matrix A(x) = 
A(0,, +++ c) which is not scrambling and such that (x) = t > k. Thus there 
are two states i, and i, not having a common consequent by A(x). Set i, = fly, 
i, = pj, B; = Al, B!) and B? = A(o;,, B21) and consider the following 
sequence 


(B, Bò), KES (B;', B?) (4) 
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This is a sequence of unordered pairs of nonvoid disjoint [by assumption] sets 
of states, and as t > k + 1, the sequence contains at least two equal pairs, say 


(B, bA a (Bi BA. P < q (5) 
Consider the following subsequence of (4) 
(B,', B,), LONE (fi, B, (B, B,), sees (AR BA (6) 


As before, we shall denote by £, the set 8, = 8; U B?. 

The matrix A(o,,) transforms the sets (fi! ,, B? ,) into the sets (8,!, f,?), but 
A(c,) is one of the A(c;)s say 4(c,) = A(o,). The following cases must be 
considered : 


(a) Bi O (K— % 1) Æ D o  f£,n(K—-a&)-29 
This is impossible, for this would imply that f,! O fj? = Ø by (3), contrary 
to the assumption that these sets are disjoint. 
(b) By. c h-i or B,-1 c a 
which is also impossible, as in this case we get that 
Bt = AC, Bis) = a (or 0) = A(G;,, Br.) = BY 

contrary, by (3), to our assumption that f,! N B? + Ø. 

Bina, Ø, together with fina? #4 Ø, or 


(c) : 
BF. 9, together with fina. Ø 


which is also impossible, as in this case we get, by (3), that f,! fij? ~ Ø. 
fi., € al. together with f? ,ca, or 
Bii S wey, together with f,- c al, 


(d) 


and the inclusion is proper 1n at least one part of the conditions, which is also 
impossible, since by the conditional monotone property and by the impossibility 
of case (b) [applying the same argument to all pairs in sequence (6)], we get 
that the number of states in fj, is larger than that in 8,, contrary to (5). 
(e) Bi, = o1, together with f?_, = ai, or 
8 

Bi, = tn together with f? = al, 

In this case we get that sequence (6) is a middle part of sequence (3), which 
is impossible since all the sets in (2) are different, contrary to (5). 

All possible cases are covered by (a)-(e), and the proof is complete. Note 
that the bound in Theorem 4.7, although sharp, is independent of the number 
of letters in the alphabet X. On the other hand the number of letters in the 
counterexamples of Theorem 4.9 grows with n. It would be therefore interest- 
ing to find out whether the bound in Theorem 4.7 can be improved under the 
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condition that the number of letters in È is kept fixed or small [say 2 letters]. 
No answer to this question is presently available. i 


Definition 4.5: A stochastic system (S, (4(c)]) is definite [of order k] if there 
is an integer k such that all the matrices A(x) with I(x) > k are constant and 
this property does not hold true for all words x with I(x) < k. 


Corollary 4.10: If (S, (A(0)]) is a definite stochastic system of order k and 
y = ux is a word such that I(x) = k, Ku) > 0 [I(y) => k] then A(y) = A(x). 

Proof: A(y) = A(u)A(x) = A(x), by Exercise 1.4 in the preliminary section 
since A(x) is constant. 


A final problem to be discussed in this section is the decision problem for 
definite stochastic systems. This problem is solved by the following. 


Theorem 4.11: If a stochastic system (S, (4(0)]) such that [S| = n is definite 
of order k, then k <n — 1. 


Proof: Denote by V the set of all n-dimensional vectors ? = (v,,..., 2) 
such that Y; v, = 0; denote by H' the set of matrices H' = {A(x): (x) = i} 
and denote by VH' the linear closure of the set of vectors of the form A(x), 
v € V, A(x) e A, ie, 


VH = {F 0, A(x); $, € V, A(x) € Hor =0,1,..3 
i-1 


Then (a) V is a linear space, (b) VH' is a linear space VH' c V. 

To prove (b) we note that any vector of the form A(x) is in V, which is 
closed under addition; the set VH' is closed under vector addition by definition, 
and is closed under multiplication by a constant because the set V is closed 
under such multiplication [i.e., c $, v, A(x) = $ (c?) A(x) = Y v; A(x)]. 

(c) VH:*! c VH. This follows from the fact that VH c V and VH! = 
(VH) c VH. 

(d) If for some i, VH: = VH'*', then VH! = VH), j = 1,2,.... This 
follows from the fact that VH +? = (VH'*y)H. 

(e) If the system is definite of order k then VH” is the space containing the 
zero vector as its single element [i.e., dim VH* = 0], but this is not true for 
VH,i < k. 

This follows from the fact that if and only if A(x) is constant then vA(x) = 0 
for all ? € V. 

Consider now the sequence of linear spaces 


V(—VH')2VH2VH'2..2VH.. 
Because of property (d) this sequence must have the form 
VHE > VH > VH'>..-- > VEP = VÆ“ = VHP =.. 


[the sequence cannot descrease indefinitely because dim V = n — 1]. Thus, if 
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the svstem is definite of order k, then necessarily VH? = V H* = {0} by property 
(e) so that n — 1 = dim VH? > dim VH! > --- > dim VH* = 0. Hence, 
k xn- l1. ü 


Corollary 4.12: If P is a stochastic matrix of order n such that P* is constant 
but P*-! is not, then k < n — 1. 


Corollary 4.13: It is decidable whether a given stochastic system is definite. 


Example: Consider the following set of 3 x 3 matrices 


1 1 1 1 1 1 
2 9? 14 1 3 4 
Ao)=|% à $4 AM)=[% $3 b 
1 1 1 1 1 1 
4 9» X 1 9 4 
Straightforward computation shows that 
5 1 T 
74 7 3-4 
Alc) = A(0,0,) =| & 4 " 
5 1 7 
24 Z2 24 
and 
3 1 5 
is 2 316 
4A(0,0;) = A(0,0;) =| 4 «| 
3 1 5 
16 2 316 


This system is therefore definite of order 2. 


EXERCISES 


1. Prove that the property of being a nontransient state is decidable and find 
an optimal algorithm for deciding it. [A property is decidable if there is an 
algorithm with the aid of which one can decide, after finitely many steps, 
whether an element of a certain class has or has not the property.] 

2. Prove that the relation of being in the same ergodic class is an equivalence 
relation. 

3. Prove that the relation of being in the same periodic class is an equivalence 
relation. 

4. Find a graph (S, I) such that |S} = n, it satisfies the H, property, but there 
is a pair of states i, j € S which do not have a common consequent of order m 
[m is a function of n] where m is as close as possible to the bound of Lemma 
4.2. 


5. Provide a full proof for Corollary 4.5. 


6. Let P and Q be stochastic finite matrices. Prove that the product PQ is 
scrambling if one of the matrices P or Q [or both] is scrambling. 
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7. On the basis of Theorem 4.7 and Exercise 4.6 above, give an algorithm for 
deciding whether a given stochastic system satisfies H}. 


8. Consider the following condition: A stochastic matrix P satisfies condition 
H, of order k if there is an integer k and a state j such that j is a consequent 
of order k of all the states (including j). Prove: If P has finite order, then the 
conditions H, and H, are equivalent. 

9. Let f be a 1-1 function from the set of all disjoint unordered pairs of 
integers into the set of intergers. Let P = [p;;] be an infinite stochastic matrix 
such that p, + 0 if and only if there is a k with f(i, k) = j. Show that P has 
the H, property, but it does not have the H, property. 

10. Find an infinite stochastic matrix P which satisfies the H, property of order 
1 but y(P) = 0. [Compare with Lemma 4.3.] 

11. Find an infinite stochastic matrix P such that lim, d(P") — O0 but 
lim, .., 6(P") = 1. 


In the following exercises it is assumed that the matrices are of finite order. 
12. Let P be a stochastic matrix, Prove that there isa stochastic matrix Q 
such that 

limt y: pn — 
noo M m=1 
13. The matrix Q in Exercise 12 is constant if and only if there is a single 
ergodic class in the graph associated with the given matrix. 
14. If, and only if, there are no periodic subclasses in any ergodic class of P, 
then 
lim} Y: P” = lim P= Q 


no M m=i 
and Q is constant. 


15. Show that in any of the Exercises 12-14 the matrix Q satisfies the equation 
QP = PQ = Q or Q[I — P] = 0, providing a means for computing it. 


16. Prove that if the graph associated with a matrix P contains a single ergodic 
class, then there is a unique solution to the system of n + 1 equations 


(xi, x)H — P] 0 

xobesBx-l 
17. Let P, be the (n — 1)-dimensional matrix obtained from P by substracting 
the rth row from all its rows and then deleting the rth row and column. Let 


X, be the vector obtained from the vector x by deleting its rth entry. Prove: 
a. If, and only if, there is a single ergodic class in the graph of P, then 
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det(I — P) £0 forany r 


b. Let č, be the rth row of P, then the solution to the system of equations 
in Exercise 16 [given that there is a single ergodic set in the graph of P] is 


(xi, ee XpepXrapoess s Xn) z] e, zd p)! 
x,—1—9ix 
i*r 
18. If 
H 1 1 
2 4 F 
P=|} 0 4 
1 1 1 
4 4 7 


compute lim,.... P". 

19. Let P(o) be a set of stochastic matrices, let P,(c) be defined as in Exercise 
17 and let £,,(c) be the rth row of P(a) with the rth entry deleted. Let x = 
O, ::: 0,. a. Prove by induction that 


£,(x) = €, (o))P.(0; +++ 0,) + €,(0) P0, --- x) + ^: 
+ £,(Gí)P,(0,) + €, (0,) 
b. Show that with the aid of the above formula one can compute the entries 
of a 2-dimensional matrix P(x) directly from the values of the 2-dimensional 
matrices P(o) as follows: 


Py(x) = Pu) + E Palo) UI, (Palo) — Pa) 


J-i*1 


II (Pu(o;) — Pao) 


k-1 

P(x) = P0k) + »» Pula) | r 
P,(x) = 1 — Pa (x); P(x) = 1 — Pa (x) 

20. Show that if P = [P;;] is a 2-dimensional matrix, then det P = P,, — Pz 


21. Let P be an SIA matrix of order n and let 4,* denote the rth column of 
P*. Show that the set of vectors {y,"} are all contained in an (n — 1)-dimen- 
sional subspace of the n-dimensional Euclidean space. 


22. Show by examples that it is possible to have two finite stochastic matrices 
A and B such that 


a. y(4*) = 0 = y(B*) for all integers k, but there is an integer k such that 
y((AB)*) > 0. 

b. There is an integer k such that y(A*) > 0 < »(B*) but for all integers 
k y((AB)*) = 0. 
23. Two stochastic matrices P and Q are called similar (P ~ Q) if they have 
the same associated graph. Prove that if Q isan SIA matrix and PQ ~ P, then 
P is scrambling. 
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24. Let (S, (4(0)]) be a stochastic system. Prove that all the matrices A(x) are 
SIA if and only if the system is weakly ergodic. Let t be the number of all the 
different graphs associated with |S|-dimensional SIA matrices. Prove that if 
the given system has the property that all the matrices of the form A(x) are 
SIA, then all the matrices A(x) with (x) > t + 1 are scrambling [use Exercise 
23]. 


25. Let (S, (4(0)]) be a weakly ergodic Markov system. Prove that 
E || A(yx) — lim A(x")|| = 0 
(x)= noo 

for any word y. 


26. Let (S, {A(o)}) be a stochastic system such that |S| = n and having the 
following property: For any ø € È, if œ and f are two disjoint subsets of S 
and also A(c, f) and A(c, X) are disjoint then |A(c, «) U A(a, f)| > ja U BI. 
Prove that any such system satisfies the condition H; of order n — 1 and prove 
that the bound n — 1 above is sharp for such systems. 


27. Let (S, (.A(0)]) be a system such that all the matrices A(c) have the same 
graph which satisfies H,. Then the system is weakly ergodic. 


28. Find a sequence of infinite stochastic matrices P, such that for every 
integer k, P,* is scrambling but P£'! is not. 


29. Find a sequence of infinite state systems of stochastic matrices S, = {A,,(x)} 
such that S, satisfies H, of order k but S, does not satisfy H, of smaller order. 


30. Find a sequence of stochastic infinite matrices P, such that P,” satisfies H, 
but Pt^! does not satisfy H}. 


31. Show that there exists an infinite stochastic matrix P such that P satisfies 
H, but y(P*) = 0 fork = 1,2,.... 


OPEN PROBLEMS 


1. Let P be an infinite stochastic matrix and assume that there is an integer k 
such that y(P*) > 0. Does this imply that P satisfies H,? 


2. Is the condition “y(P*) > € > 0 for some € and some integer k” decidable 
for infinite stochastic matrices P? 


3. Is the condition that lim, .., d(P") — 0 implied by H, or H, for infinite 
stochastic matrices? 


4. Find a sharp bound for Lemma 4.2 or show that the given bound is sharp. 


5. Improve the bound of Theorem 4.7 under the assumption that the alphabet 
is bounded [e.g., 2 letters] or show that it is impossible to improve the bound. 
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We shall list here for future reference the properties of the eigenvalues of 
stochastic matrices without going into details. A detailed account on these 
properties, as well as their proofs, can be found in the book of Frechet (1938). 
See also Feller (1957), Frazer, Duncan, and Collar (1938), Turnbull and Aitken 
(1932), Turakainen (1968) and Yasui and Yajima (1969). 

Let A be a stochastic matrix A = [a;] and let 4;,..., å, be the distinct 
eigenvalues of A [r < n = the order of A]. Then 


1. M] <1 fori = 1,2,...,7. 

2. There is an index i such that A, = 1. 

3. If and only if the eigenvalue 4, = 1 is simple, there is a single ergodic 
class in the graph of A. 

4. Let A” = [af7?], then for m > n the following identity holds 


Li 
ap = È A," (m) 


where @,,,(m) is a polynomial in m of smaller order than the multiplicity of 
Aue 
5. There are periodic classes in the graph of A if and only if there are eigen- 
values A, such that A, ~ 1 but {A,| = 1, in which case all these A, are roots of 
unity, and the subsum corresponding to these eigenvalues in the formula in 4 
above is not identically equal to zero. 

6. If the eigenvalues of A are all simple then the formula in 4 reduces to 


n n 
dP = È E PPT APPA" 


where x‘* and y{® are the ith entries in the column or row eigenvector, corre- 
spondingly, of the eigenvalue 4,. 

7. If the eigenvalues of A are all simple, then A can be written in the form 
A= YL A, where A4, = 1, and the As are square matrices such that 
AA; = O if i+ j, A? = A, and A, = lim,,... A”, if the limit exists. 

8. If A and B are two stochastic matrices which commute and have simple 
eigenvalues, then they both have the same A, [i.e., they both have the same 
limiting matrix, if it exists] and the same As, i > 1, will appear in the expan- 
sion in 7. 

9. Let the formula in (4) be written in the form 


2s A," O(m) = @,,(m) + e, (Km) 
where @,,(m) is the subsum corresponding to the eigenvalues À, such that 
|A,| = 1 and €,,(m) is the remaining subsum. Then @,,(m) is a periodic func- 
tion of m [over the integers] having finitely many values and e, (m) is a func- 
tion of m such that lim,, ... |e;(m)| = 0. 
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Some of the properties listed above will be used in subsequent sections. Some 
others should be used in the proofs of the following exercises; the rest of them 
are given for the sake of completeness. Unfortunately the properties of the 
eigenvalues of individual matrices 4; have very little to do, in general, with the 
properties of the eigenvalues of their products of the form [[ 4,. Therefore 
the main use of the properties of eigenvalues is for the homogeneous case [see, 
however, the above-cited works of Turakainen (1968) and Yasui and Yajima 
(1969)]. In that case (the homogeneous) there is a strong connection between 
the properties of the eigenvalues and the classification of states given in the 
previous section. This is shown by properties 3, 5, and 9. 


EXERCISES 


1. Let (s, {A(o)}) be a finite state system such that all the eigenvalues of the 
matrices A(c) are simple, A(o,)A(o,) = A(o,)A(o,) for all i and j, and the 
products of “corresponding” eigenvalues other than +1 tend to zero. Then 
the system is strongly ergodic. [By corresponding eigenvalues we mean eigen- 
values corresponding to the same matrix A, in the expansion of A(o), property 
7. Because of property 8 all the matrices A(c) have the same matrices A, in 
their expansion.] 


2. Show by an example that there are stochastic matrices having the same limit 
but which do not commute [AB + BA but lim A4" = lim B" = Q]. 


3. Prove that 2-state stochastic matrices which have the same limit commute. 


In the following exercises the matrices are assumed to be of order 2 and the 
eigenvalue which differs from +1 (if there is such an eigenvalue) of a matrix 
A will be denoted by 4^. 


4. Prove that if A = [a,,] is a two state stochastic matrix then A^ = det A = 
Q1; — d. 


5. If A = [a,,] and B = [b,;] and AB = C = [c;] then 
€; = anA? + by; Coy = Ay, A? + by 


extend this formula by induction to longer products of 2-state stochastic 
matrices. 


6. Let X = (0, 1,..., d — 1} and define 


A) = 


Prove that 
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A(X) = 09,0, 17: 0i 

where x = 0,,...,0, € Y,* and .0,0,.,--- G, isan ordinary d-ary fraction. 
7. Let (S, [4(0)]) be a system such that the ratio aj;(0)/a;(c) is independent 
of g and 4^? < 1 for all o then the system is strongly ergodic [find the limit- 
ing matrix]. 

8. Let (S, (P,)) be a two state Markov chain. If [],-,, 4" tends to some limit 
which can be calculated and the ratio between the 1, 2 element and the 2, 1 
element of P, is independent oni then the limit H,,, can be calculated [find the 
formula]. 


9. Find H,, where 


1 2n, 2n; 
PI P1 
P, — 
2n, 2n 
| B1 PI 


where n, m > 0 and n, + n, = 1. Hint: [[z; [m — 1)/ + 0] = &. 


10. Let (S, (4(0)]) be an n-state stochastic system such that all the eigenvalues 
of 4(c) are simple for all ø all the eigenvalues of A(c) different from +1 have 
modulus <1 and such that lim,... A(g,") = lim,... A(o,") for all i and j [the 
limit exists necessarily by the above required properties] then the system is 
strongly ergodic. 


11. Formulate and prove a theorem which parallels the theorem in Exercise 10 
for n-state Markov-chains. 


12.* Prove: For any integer n, there exist a finite set of stochastic matrices 
such that any probabilistic vector of order n having finite binary expansion, can 
be realized as a row in a finite product of these matrices [compare with 
Exercise 6 above]. 
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Mott and Schneider (1957), and Paz (1963). 


B. OPERATION ON MARKOV SYSTEMS 


1. The Direct Sum and Product 


Definition 1.1: Let A and B be two square matrices, A or order r and B of 


order s. The matrix 
: A 0 
438-5 5| 
of order r + s is called their direct sum. It is easily verified that 
(A, i B,Y(A4; t Bj) = AA; i B,B, (7) 
provided that the right-hand side of the equation is defined. Trivially, the 
direct sum of two stochastic matrices is stochastic. 


Definition 1.2: Let A = [a,,] and B = [5,;] be two matrices [not necessarily 
stochastic] of order m x n and p x q respectively [thus the matrices are not 
necessarily square]. Then 4 & B denotes the Kronecker [or direct] product of 
A and B where 


AG) B — C — [cry] = [ayb] 
The double indices ik, jl of the elements of C are ordered lexicographically 
ik = 11,12,...,1p,..., ml,..., mp; 
jl — 1L...,19,...,nl...,ng 
Note that the elements in the ikth row of C are products of elements in the 


kth row of B, and similarly for the j1th column of C. C can thus be written in 
the form 


aB Ne a,,B 
C=| : 
Gy, B+ ** annB 
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Lemma 1.1: Let A = [aj], B = [b], A’ = [aj] and B’ = [b;] be matrices 
such that the [ordinary] products AB and A’B’ are defined. Then 
(AB) G9 (A'B') = (A G9 A'B & B^) (8) 
Proof: The ij entry in AB is Y, a,b. The kl entry in A'B' is >), aj, b; 
Therefore the ik, jl entry in (AB) ® (4'B") is Y, a,b; DO, at, b. The ik, mn 
entry in A Q) A’ isa,,a,,. The mn, jl entry in B (X) B' is b,,b,,. Therefore the 
ik, jl entry in (A G9 A'YB 69 B') is 


PX Aim Dyn Dm; b, = P» Gs Ds; > Den Don 
as required. 


Lemma 1.2: If A and B are stochastic matrices, then so is A & B. 
The proof is straightforward and is left to the reader. 


Definition 1.3: If (S, (4(c)]) and (S'", 4'(c)]) are two stochastic systems over 
the same alphabet X, then their direct sum is defined as (S U S’, {A(a) + A'(a)}) 
and their direct product as (S x S’, (4(o) Q 4'(o)]). It follows from (7) and 
(8) that the matrix related to a word x e X* is A(x) + A'(x) in the sum 
system and A(x) &9 A'(x) in the product system. 


Lemma 1.3: Let A = [a,,] and B = [b,,] be two scrambling [see Definition 
A.4.3] matrices, then C = [c;] = A ® B is also a scrambling matrix. 


Proof: Let i,k, and i,k, be any two rows in C. A being scrambling, there is 
j, such that a, ;,,@;,;, > 0, similarly there is /, such that bg, brn > 0; this 
implies that ¢;,4,;,, > 0 and Caryn, > 0 and therefore the states labeled i,k, 
and i,k, have a common consequent in the graph of C. 


Corollary 1.4: Let (S, (4(c)]) and (S', (4'()]) be two quasidefinite stochastic 
systems, then (S x S’, {A(a) G9 A’(a)}) is a quasidefinite system. 


Proof: The proof is straightforward and is left to the reader. 


Definition 1.4: Let A = [a;;] be a Markov matrix and let (B(g)) = ([5,/(g)]) be 
a set of Markov matrices, one matrix for every state q of A. The cascade 
product of A and {B(q)} is the matrix C = [cix n] = [ay bÒ]. 


Definition 1.4 above can easily be extended to Markov systems but property 
(8) in Lemma 1.1 does not apply here, and there is no simple relation between 
an entry in a matrix corresponding to a word in a cascade product and the 
entries in the components of the system corresponding to the same word. Once 
a cascade product is formed it can be further combined in cascade product 
with another set of matrices, and so forth. 

The reader who is familiar with deterministic automata theory will recognize 
that Definition 1.4 above is an extension of the parallel definition in the deter- 
ministic case. The graphical representation of a cascade product is given below 
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in Figure 14. The two systems A and B are assumed to be Markovian. The 
next state of A depends on the present state of A and on the present input (line 
represented by X). The next state of B depends on the present state of both A 
(line represented by Y), and B and on the present input. The state of the 
system C is represented by a pair of states, one state from A and one from B 
(lines Y and Z). The system may further be generalized by introducing a 
combinatorial gate between line Y and box B, and another combinatorial gate 
between the lines Y and Z, and the actual output of the system. 


Figure 14. Graphical representation of a cascade product of Markov matrices. 


Note that if the system B is independent on the input line Y, then the cascade 
product reduces to the previously defined Kronecker product and the connec- 
tion between the two systems is a parallel connection. 

In addition to the direct sum, Kronecker product, and cascade product de- 
fined above, one can define other forms of connections or combinations of 
connections. The basic problem is, however, to find conditions under which a 
given Markovian system can be decomposed into simpler parts, using these 
interconnections. This topic will be dealt with in the next section. 


EXERCISES 


. Prove the relation (7). 
. Prove Lemma 1.2. 


l 

2 

3. Prove Corollary 1.4. 

4. Prove Corollary 1.4 for definite systems [see Definition A.4.5]. 
5 


. Prove that Corollary 1.4 holds true when one of the systems is quasidefinite 
and the other is definite. 


6. Prove that the box C in Figure 14 represents a Markovian system [i.e., its 
next state depends on its present state and present input only] provided that 
the systems A and B are such. 
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Definition 2.1: A set S’ of states of a Markov system A is a persistent sub- 
system of A if and only if the set of states which are accessible from S’ are in 
S. 

Note that it follows from Definition 2.1 above that the submatrices of the 
matrices of 4 corresponding to states in S' are Markov matrices. 


Definition 2.2: A Markov system (S, C(o)) is decomposable if and only if it is 
isomorphic to a persistent subsystem of a cascade product of two (or more) 
Markov systems such that the number of states of every component in the 
product is smaller than the number of states of A. 

Let C = (S, {C(a)}) be a Markov system and assume that it is decomposable. 
Then C(oc) is a submatrix of the matrix [c;, ;(0)] [the row and column indices 
have been written as double indices to facilitate the exposition], and [see 
Definition 1.4] after a proper assignment of indices, 

[cik (0)] = [4,,(0)- dali, 9)] (9) 
where A = (S’,{A(a)}) and B = (S", {B(i, o)}) are Markov systems with 
|S’| < |S| and |S"| < |S|. There may be entries cx (0) in (9) which do not be- 
long to C(a), since it is required only that S c S’ x S" [C is a persistent sub- 
machine of the cascade product], in which case the Eq. (9) contains “don’t care" 
conditions. 

Summing up both sides of Eq. (9) over / and noting that B(i, a) are stochas- 
tic for every i and g, we have that for fixed i, k, and j 


x Cix j(0) = a;(0) (10) 


The right-hand side of (10) does not depend on k and therefore also the left- 
hand side must have this property. 

Summing up now both sides of Eq. (9) for j and noting that A(o) is stochas- 
tic we have that for fixed i, k, and / 


Xi cuu (0) = bai, 0) (11) 


Combining (9), (10), and (11) we have that for every i, j, k,l the following 
equation must hold true 


»» €; ji (0) x Cik aO) = Cix (0) (12) 


We are now able to formulate two necessary conditions for decomposability. 
Definition 2.3: A partition on the state set S of a system is a collection of sub- 
sets of S such that each state in S belongs to one and only one such subset. 
Each subset as above will be called a block of the partition. If the number of 
blocks is bigger than one and smaller than the number of states, then the 
partition is nontrivial. 
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Let C = (S, {C(a)}) be a Markovian system which is decomposable. It 
follows from Eq. (10) above and the remark after that the system must satisfy 
the following: 

Lumpability condition: There exists a nontrivial partition on the state set S 
such that for any c, the sum of the columns of the matrix C(o) corresponding 
to any block of the partition, is a column having equal values in entries corre- 
sponding to the same block of the partition. 


Remark: One sees easily that the partition in the lumpability condition above 
is represented in Eq. (10) by the first part of the row [or column] double index, 
ie. two states are in the same block if they have the same i in their row-ik 
index (or the same j in their column-j/ index). Thus, summing up all c; (0) 
for fixed j [i.e., in a given block] results in a value which depends on i [i.e., on 
the corresponding block] but not on k. 

It follows now from (12) that if the system (S, (C(c)]) is decomposable, one 
must also have the following: 

Condition of Separability: There exist two nontrivial partitions on the state 
set, z with blocks z, and t with blocks v, such that: (1) [z, O T} < 1 for all 
jand l; (2) if z, O T, = jl, then for all ik and all ø 


x Cj, i (0) x Cix nO) = Cix, pF) 


Remark: The partitions z and 1 in the separability condition are represented 
in Eq. (12) by the first and second part of the column [or row] index corre- 
spondingly. Thus two states are in the same block of z if they have the same 
j and they are in the same block of t if they have the same / in their j/-column 
index. 

The previous considerations suggest the following: 


Theorem 2.1: A Markov system (S, (C(co)]) is decomposable if and only if it 
satisfies the conditions of lumpability and separability with the same z partition 
in both conditions. 


Proof: Necessity has been proved already. It is easy to show that the con- 
ditions are also sufficient, for if a system (S, (C(o)]) satisfies the two conditions, 
then, by a proper reindexing of the entries of the matrices C(c) into double 
indices: C(o) = [c;,,,(0)] with i, j ranging over the blocks of z and k, / rang- 
ing over the blocks of 7, one can define the matrices A(¢) and B(i, o) by way of 
the Eqs. (10) and (11). [If for some / and k, z, N t, = Ø then this represents 
a “dno’t care" condition and the corresponding entries in the B(i, c) matrices 
can be chosen at will.] i 

The decomposition procedure will be illustrated in the following example. 


Example 13: Let C = (S, {C(o)}) be a Markov system such that S = 
{1, 2, 3, 4, 5} [for the sake of simplicity the states are identified with their index 
if no ambiguity results], 2 = (a, b}, and 


106 Chapter II. Markov Chains 


+ 0 4 0 3 10000 
104204 11000 
C(3-—-|]$ +} $40, CH=}0 7 03 0 
0$ 0210 +o 3 3 9 
003 $0 $0504 


Consider the partitions 2 = (7, nz, 75) = ((1, 2}, (3, 4}, (5) and T = (1, tT) = 
((1, 3, 5}, (2, 4]). It is easy to verify that z satisfies the lumpability condition 
and z and t satisfy the separability condition. Using Eq. (10) we have 
Yes C(O) = a (0), i € Rg; k, 1 = 1, 2, 3, or 


1141 100 
Aa)-|$ 4 0, A=] 2 0 
010 444 


Using Eq. (11) now we have Y,,.,, c,,(0) = b,(m,o), if k € t; O nm Æ Ø; 
m= 1,2,3;i, l = 1,2. If m = 3andi = 2, then T; Tm = Ø, and the values 
b,(m, a) can be chosen for this case in an arbitrary way subject to the condition 
that the B(i, 0) matrices are stochastic. Choosing b,,(3, 0) = b5;(3, 6) = 4 for 
c = a, b we have 


sa| o) seo- I} eos; i 
? Z 


1 0 0 1 1 0 
sa| 1p sao=[) b se»-[ jj 
5 35 Z 


and the decomposition is completely defined. 


Corollary 2.2: Let (S, C(c)])) be a Markov system such that there are two 
nontrivial partitions z and t on its state set satisfying the following properties: 


1. Both z and 7 satisfy the lumpability condition. 
2. z and T satisfy the separability condition. 


Then the system is decomposable into a Kronecker product of two systems. 


Proof: Consider again Eq. (11) and let b,,(i, o), b,(j, 0) be two different 
elements in its right-hand side with fixed k, /, v. b,,(i, &) is the sum of the ele- 
ments corresponding to the block T, of t in a row corresponding to the block 
7, of z in the matrix C(a) [to be more specific, the index of the row is zt N Ln 
Similarly, b,,(j, 0) is the sum of the elements in the row with index zt, O t, 
corresponding to the block 1, of t in c(a). It follows from the fact that t 
satisfies the lumpability condition that 5,,(j, c) = b,(i, 0), the summation 
being over entries in the same block of t(t,) and the rows belonging to the 
same block of t(t,). Thus B(i,c) = B(j, o) for all pairs i, j so that the B 
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system can be represented in the form (S"', (B(c)]) and is independent on the 
state of the system A, which proves the corollary. | 


Remarks: 


a. Given a Markov system (S, (C(c)]) which satisfies the lumpability con- 
dition above one can still use Eq. (10) to define a new system (S’, {A(a)}) with 
{S’| < |S| and such that the original system is homomorphic to the new one 
[ie., there is a mapping $ from S to S' such that aj(x) = De su cux), 
k e $^ (i) for all x € X*; the states in S’ will be the blocks of z and if 
m € 7, then 6(m) = n]. The new system is, however, not isomorphic to the 
original one which cannot be recovered back from it. Some of the information 
on the transition probabilities from a particular state to another is lost in the 
lumping process and only the information about the transition probabilities 
from a block of states to another block is retained. [see Exercises 1, 2 at the 
end of this section.] 

b. The set of all partitions over a set of states, including the trivial parti- 
tions have a lattice structure. One can define a partial order — over partitions, 
where z < t means that each block of t is the union of one or more blocks 
of z. Thus if S = (1,2, 3, 4, zx = ((L, 2}, {3}, (4) t = (11,2, 3}, (4), then 
n «x t. 

Let 1 be the partition with all the states in a single block and 0 the partition 
with each state in a separate block and, using the partial order defined above, 
define z + T to be lub(z, t) and 2-7 to be glb(z, t). Clearly 0< z < 1 for 
any partition z and, as the lattice of partitions over a finite set in finite, z + t 
and z-t always exist. Thus ({1, 2}, {3}, (4, 5, 6}) + ({1}, (2, 3} (4, 5}, (6), = 
((1, 2, 3}{4, 5, 6}) and ({1}, {2}, {3}, (4, 5}, {6}) is the product of the above two 
partitions. In addition to the above properties, one can also prove the following: 


Theorem 2.3: If z and t are two partitions over the set of states S of a Markov 
system (S, (C(c)]) such that both partitions satisfy the lumpability condition and 
in addition 


PECOLIPDO PET (13) 


then z-T is a partition satisfying the lumpability condition. 


Proof: Because of the lumpability condition for both z and q the sum J jen 
c;(0) has the same value for all i € z; and the sum )),.,,¢,,(0) has the same 
value for all i € 7, where x, and 7, are arbitrary blocks in z and 7 respective- 
ly. It follows that the sum Y5,.,,4.,c;,(0) has the same value for all i € 2,™t,. 
But z; A t, and z, A t, are arbitrary blocks of zt- 1 and all the blocks of z-t 
have this form, which proves the theorem. i 

Using the algebra of partitions and the theorem above one can find all pos- 
sible pairs of partitions satisfying the necessary conditions for decomposition. 
It is to be mentioned, however, that, in contrast to the deterministic case, there 
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exist no clear-cut theory of decomposition for Markov systems. The conditions 
of lumpability and separability are restrictive and cannot be both satisfied in 
general. 

c. Generalizations of the results in this section can be achieved through in- 
troducing combinatorial gates between the various parts in an interconnection 
of systems or through combinations of various types of decomposition. In ad- 
dition a decomposition can be carried through several steps leading to more 
than two component subsystems. Those possibilities have been mentioned before. 
There may be also decompositions based on interconnections more general than 
the cascade type as will be shown later. [See exercises 9-12 at the end of this 
section.] Still another possibility is the possibility of state splitting. This will 
be illustrated now by the following: 


Example 14: Let (S, (C(c)]) be the 3-state system over X = (a, b} with 


o 
ed 


2 1 

$ 3 
C(a)-—-|& $ $} C(b-—|$ $ $ 
2 1 0 H 1 1 
$ $ e$ 3 3$ 


An easy check will show that the above system is not decomposable. One can 
try, however, to split some state into two, to get another 4-state system which 
will be decomposable into two 2-state components. Suppose some state say s, 
is split into two (or more) states s; and s, i.e., the ith row in each matrix is 
duplicated and then the ith column is divided into two columns whose sum is 
equal to the original one. Trivially, the new system satisfies the lumpability 
condition for the partition which will merge the states s; and s; into a single 
block and leaving all the other states alone. The new system is therefore 
equivalent to the old one provided that the states s, and s, are merged at 
its output, and a decomposition of the new system provides us, therefore, with 
a decomposition of a system which is externally equivalent to the original one. 
In our example one may try to split the second state so as to have a 4-state 
system with matrices 


2 1 17 
0 a. a3 $ $ bu bs $ 
2 2 2 b b 2 
$ G2 05 f 9 22 22 $ 
C'(a) = 2 2l C'(b) = 2 2 
$ 42 05 5$ $ by bs $ 
1 1 
$ Ayn a 0 $ ban ba dj 


and the a;, and b,, will be determined by a series of equations requiring that: 
(1) The sum of the two a columns and the two b columns equal to the cor- 
responding columns in the original matrices C(a) and C(b); (2) there is a parti- 
tion x say z = {{s,5,}, ís,s4]] which satisfies the lumpability condition; and (3) 
there is a partition 7 say T = {{s,5,}, {s.5,}}, such that z and t satisfy the 
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separability condition. Formulation of these equations is an easy matter to do 
and is left as an exercise. The resulting matrices are 


1 2 1 o1 1 1 
03 0 3 3 3 g 6 
$ 5$ 6$ $ $3556 

y m b) = 

Ca) 2 4 1 2P C'(b) 2 1 4 2 
85 9 9 9 $9 9 9 9 
2 1 1 1 i1 1 
$040 cy 3 3. 


A decomposition of the system is now obtained in the same way as in Example 
13. The resulting decomposition (for z and t as specified above) is 


w- wp 
3 3. 3 3. 
san- senf f 

3 3 3 3. 
B(a, 2) = E | B(b, 2) = B | 


d. In deterministic machine theory, it has been proved that, by properly 
splitting the states of an n-state machine one can always decompose an external- 
ly equivalent machine, in a cascade form, into two component machines, one 
of them having a set of transition matrices which are either permutation or reset 
matrices and the other having only n-1 states. This fact has served as a basis 
for the classical theorem of Krohn and Rhodes (1963) showing that every de- 
terministic machine can be “embedded” into a cascade interconnection of a 
sequence of machines of a certain simple and cannonical form. Unifortunately, 
it seems reasonable to assume that the Krohn-Rhodes theorem does not carry 
over, in its original form, to the stochastic case. One of the reasons for this is 
that even if state splitting is allowed the conditions for cascade decomposability 
seem to be restrictive for stochastic systems and cannot always be met. Note, 
however, that a cascade interconnection of a sequence of systems 4;, 4; ..., 
A, has the property that the next state of a system A, in the interconnection 
depends on the present input, on its present state and on the present state of 
all other systems A, with j < i, but does not depend on the present state of 
any system A, with j > i. This means that the interconnectivity in the decom- 
position is not maximal a fact which has some advantage from the realiza- 
tion point of view. We will show now that if the interconnectivity is allowed 
to be maximal, then any n-state Markov system can be decomposed into a 
sequence of 2-state Markov systems. 


Definition 2.6: Let 
A= (S, [AC Deez) and B= (T, {Bl 3),..) 
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be two Markov systems. The system (S x T, (C(o)],..) is the maximal inter- 
connection of 4 and B if 


Ca) = [C......(0)] and Cs sir) = a,, (0, t)b,(o, s) (14) 


(las(0, t)] = Ae, t); [bu(0, s)] = Ba, s). 

Thus, in a maximal interconnection, the next state of each system depends 
on the present state of both systems and on the present input. 

It is easily proved that a maximal interconnection of two Markov systems is 
a Markov system. A maximal interconnection reduces to a cascade intercon- 
nection if all the matrices of one of the two component systems corresponding 
to the.same input, v, are equal. Once the maximal interconnection of two 
systems is formed the resulting system can be further maximally interconnected 
with a third system and so on. The resulting system will be called a maximal 
interconnection of the sequence of systems involved. Definition 2.6 is illustrated 
in Figure 15. 


Figure 15. Graphical representation of a maximal interconnection of 
Markov systems. 


We are now able to state the following: 

Theorem 2.4: For each n-state Markov system A = (S, {A(a)}) there exist two 
systems B, with state set T, containing two states and B, with state set T, con- 
taining n-l states and a partition p on the state set T, x T, = T of their 
maximal interconnection C = (T, (C(o)]) such that if states of C belonging to 
the same block of p are merged, then the resulting system is equivalent to the 
original given system A. 

Proof: Given the system A with state set S = fs, --- s„}, split the state s, 
into n — 1 states s,! - - - s?^!, i.e., let A’ be a new system having state set S' = 
(s, -- Snai 5, +++ 87] and matrices A’(a) = (a; (o). 

Define the following two partitions over S’ 

r= ({s, oN 5,1) IA AE seth), i (fs, 5,'}, [545,2], OE [5-152 13) 
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We shall define the matrices A'(c) in a way such that the above two partitions 
will enable us to express the system A’ as a maximal interconnection of two 
systems, a two state system B = (z, (B(c, T,)}) whose states are the blocks of z, 
and an (n — 1)-state system B’ = (t, (B'(c, 2,)}) whose states are the blocks of 
1, In addition we will require that the partition {{s,},..., (s, i] (s! --- ZR 
satisfy the lumpability condition for A’ so that after the states s,! --- s7^! are 
merged, the resulting system is equivalent to the original system A. In order 
to satisfy all the above conditions one must have that, for any fixed row in 
A'(o), say the ith, the following equations hold 


(a) Y ao) X ao) = a0) with $, — Zt, OT 
, E a; (0) if i<n 
(5) gaohi rasa 


aio) if ij<n—1 
a,j() if in j<n-1 

Equations (b) and (c) are necessary and sufficient for the lumpability re- 
quirement while Eq. (a) is equivalent to property (14). This follows from the 
fact that 3,c,, ,, (0) = a,,{o, t) in (14) is equivalent to $3. ena; here and 
EC sel(0) = 5,0, s) in (14) is equivalent to 37,.., a;/(0) here. Combining 
these two equations one gets from (14) that 


x Cosel) »» €, (0) = OR C3) 
which is equivalent to the Eq. (a) here. Now Eq. (a), (b), and (c) above 


uniquely determine the matrix A'(c) given the matrix A(c). Indeed for i < n, 
k «x n — 1 we have by (a) that 


( È, a, (0))(ai,(o) + d, n+n-\(9)) = d, ksn- (0) (15) 


Using (b) and (c) we change this equation into the following equation, where 
4, k+n-1(0) is unknown and all the other values are known, 


©) 4o -| 


a; (0)(ai(0) + aii. .(0)) = a, k+n-1(0) (16) 
or, by transposing the second left summand to the right-hand side we have 
a; (0)a; (0) SE A, k+n-1(0) [1 mi a;(7)] (17) 
thus 
= aj (0) 
d, xe, (0) "e a;(0) 1 pun Qin (0) (18) 


Since 1 — a,(0) = Y,.,a,(0) > a(o), both sides of the equation are non- 
negative. If a(o) = 0, then a;,,, (c) = O and if a(o) = 1, then a; ,,, (o) 
can be arbitrarily chosen provided that 3772, a,,., (0) = 1 and all the 
summands are nonnegative. It follows if the values a; ,., ,(o), i n, k <n — 
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1 are chosen according to (18) then the requirements (a), (b), and (c) are 
satisfied, since the derivation of (18) is reversible and (15) implies also the fol- 
lowing 


(Fa (NG) + din-i) = (1 — E a foin) + dons) 


= AkO) + ai pss (0) 
— x a;(e))(a(o) + aj k+n-1(0)) 


= dlo) + EC = 8, kis (0) = a(o) 


as required. As for the case i > n, it follows from (c) that the first n — 1 
entries in each such row must be equal to the corresponding entry in the nth 
row and therefore by (18) this must be true for the full rows, i.e., the nth row 
in A'(c) as determined by (c) and (18) must be duplicated n — 1 times. It 
thus follows from the construction that the system A = (S’, (4'(0)]) can be rep- 
resented as a maximal interconnection of the two systems B = (T,, B(o, T,)}) 
and B' = (T, (B'(c, z,)]) where the elements of T, and T, are the blocks of z 
and c respectively and the matrices B(c, t,) = [b,(c, t)] and B'(o, z;) = 
[b,.(o, 2,)] are defined by 


balo, T) = 2 a, (0), $S,— 7; 0%, kli-12;i—12-:-n—1 
PER 


and 
b,(o, n) = 2 Qn (0) Sk =N; N Tk; k,l = 1,2,...,n— l; j= 1,2 
: pen 


One sees easily from the construction that if p is the partition p = ({s,J, --- 
{s,-1}, {s,'s,2 - 5777] then the system A’ is equivalent to A when all states in 
a block of A’ are merged into a single state. | 

Corollary 2.5: For each n-state Markov system A = (S,{A(o)}) there exist 
n — 1, 2-state systems B, with state sets T, respectively and a partition p on 
the state set T = T, x T, x +-+- x T,., of their maximal interconnection 
C = (T, {C(e)}) such that if states of C belonging to the same block of p are 
merged, then the resulting system is equivalent to the original given system A. 


Proof: By Theorem 2.4 and induction. [| 
Example 15: Let A = (S, (A(0)]) be the 3-state system over », = fa, b} with 


bad 0 0 1 
ad=; s t ABD=|4 0 4 
0 1 0 0 i 4 


Using (b), (c), and (18) we construct the system A’ = (S', (A'(o0))) with 
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id d di 00 10 
l 1 1 1 1 0 10 
Aa) = S 3 p 8 5 A'(b) = T: F 
010 0 0103 
0100 040 3 


so that p = ({s,}, {s,'ffs,' s}, and A’ is equivalent to A if states s,! and s, are 
merged. Let z = (7,75) = ({5;5,}, 5, s/]) and T = (T: T2) = (i55, (552). 
Using these partitions and the method outlined in the proof of Theorem 2.4, 
the systems B = (T,{B(a, 2,)}) and B’ = (T{B’(a, t,)}) are derived where 


B(a, 1;) = H fl BG, T) — i | 


1 0 1 0 
0 1 ES 
B(b, n=l) 2 Bb) =l? | 
4 > 4 4. 
and 
2 1 0 1 
Ban) =|" B'(a, ZEN 1 
2 72 
gi 1 0 TEE 0 ] 
(b, x) = 1 0 , (b, 7, a 0 1 
EXERCISES 


1. Let A = (S, {A(o)}) be an n-state Markov system and let z = (1,2, - - - Ta) 
be a partition over S satisfying the lumpability condition. Let U be a stochastic 
k x n matrix such that U = [u,,] and u, + 0 only if s; € z, [note that U is 
not unique]. Finally, let V be an n x k stochastic matrix such that V = [v;;] 
and v,, = 1 if and only if s; € 7. 

a. Prove that the system A = (z, (U A(o)V ]) is k-state Markov system where 
the matrices U.A(c)V represent the transition probabilities between the blocks 
of z, i.e., A is the system derived from A if the states belonging to the same 
block of z are merged into a single state. 

b. Prove that for exery o € X, VU A(o)V = A(o)V. 

c. Prove that for all x € E*UA(x)V = A(x) where by definition A(c) = 
UA(o)V and A(x) = A(o,) --- A(o,) if x = 0, +++ Op 

d. Let T = (T, --- t,) be any partition on S. Let U be a Markov matrix 
U = [u;], u,; ~ 0 if and only if s; € c, and all nonzero entries in a row of U 
are equal. Let V be a matrix defined as above for t. If for every a € X, 
VUA(c)V = A(o)V, then 7 satisfies the lumpability condition. 

e. Let p = (p, :-- Px) any partition on S such that there exists a Markov 
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matrix U = [u;;] with u, 40 only if s, € p, satisfying UA(o)VU = UAC) 
for all c € È (V is defined as before, for the partition p). Then for all 
x € E*UA(X)V = A(x) where A(o) = UA(a)V and A(x) = A(a,) --- A(O), 
if x =O, +++ Og. 


2. Let (S, {A(a)}) be the following 4-state system over È = (a, b} 


02 03 03 02 0.25 0.15 04 02 

04 01 0 0.5 02 02 01 05 
A(a) = , A(b) — 

0.1 02 0.4 03 03 045 01 0.15 

03 0 02 0.5 045 03 0.15 0.1 


and let z be the partition z = ({s,5,}, {s,5,}). Prove that 2 satisfies the lump- 
ability condition; find corresponding U and V matrices and define the system 
(x, [UAV )). 
3. Let A = (S, (A(o)]) be the following 4-state Markov system over X = 
fa, b} 
02 02 03 03 0.3 0 07 0 
0 04 0 0.6 ^ i 
A= e A0)— 0 03 0 07 
04 01 04 01 0.08 0.12 0.32 0.48 
0.25 0.25 0.225 025 0.06 0.14 0.24 0.56 
and let z = ({s,52}, (ss), T = ({S1 S3}, {s25,}) be two partitions on S. Prove that 
z satisfies the lumpability condition and that æ and p satisfy the separability 
condition. Decompose the A system accordingly in a cascade form. 


4; Let A = (S, {A(o)}) be the following three-state system over È = (a, b]) 


02 05 03 03 07 0 
4A(a)—c|03 0.55 0.15), A(b) = |0.06 0.38 0.56 
0.45 0.45 0.1 0.14 0.62 0.24 


Split the second state into two states so as to get a new system A’ which can be 
decomposed into a cascade product of two 2-state Markov systems. 


5. Prove that the maximal interconnection of two Markov systems is a Markov 
system, i.e., the next state of the interconnection depends on its present state 
but not on its previous history. 

6. Let A = (S, {A(@)}) be the following 4-state Markov system over X = 
la, b) 


4 440 1000 
1 1 1 0 i 4 1 
49-3 95 3} A(b) — =z 4d 34 
0110 $ $$ 3$ 


ae 
es 
a= 
aja 
e 
ed 
af 
ale 
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Apply Corollary 2.5 to this system find the three corresponding 2-state systems 
and find the three corresponding 2-state systems B,, B}, and B}. 


7. Let T, in Corollary 2.5 be T, = {t,;, ta}. Prove that the partition p in that 
corollary can be written in the following form: p = (p,--- Pa) with p, = 
((tis ta. oe toU...) P1 E Tu... m! e T, dif j<n—1 and 
Pa-1 = astu situe Pa = Fo stu) 

8. Consider the following: 


Definition: Two partitions z and t for a system A = (S, (A4(o)]) are a partition 
pair if for each pair of blocks z; and v, 35,..,a,(0) = Dyer, x(O) for all 
i, k € n, N Ta for each / such that z, (^ t; Æ Ø and for each o € X. 

a. Prove that if and only if t satisfies the lumpability condition then (T, T) is 
a partition pair. 

b. Prove that for any partition 7, (0, T) is a partition pair. 
9. Prove that if a Markov system A is deterministic [its matrices are de- 
generate] and z and 7 are two partitions over S such that z-t = 0, then these 
partitions satisfy the separability condition. 


10. Prove the following: 


Theorem: A Markov system with state set S is decomposable in a cascade 
form if there exist partitions z, 0", and t on S such that 

a. 7 satisfies the lumpability condition and 0* > 7; 

b. z and 7 satisfy the separability condition; 

c. (07 - v, T) is a partition pair [see Exercise 8 for the definition of a parti- 
tion pair]. 
Remark: The above theorem is a generalization of the “if” part of Theorem 
2.1 taking care of the possibility of having a combinatorial gate [represented 
by the partition 0"] between the output [i.e., the state] of the first component 
in the decomposition and the second component. Note that if 0" = z then 
07.7 = 0 and (0, x) is a partition pair [see Exercise 8] so that the third con- 
dition of the theorem is superfluous. 


11. Prove that if condition (c) in the theorem of Exercise 10 is deleted and the 
requirement that also the partition 7 satisfy the lumpability condition is added 
then the system satisfying the changed conditions can be decomposed into a 
Kronecker product of two systems. 


12. Formulate and prove a theorem generalizing the theorem in Exercise 10 
so as to include the possibility of decomposing a given system into a cascade 
product of more than two smaller [i.e., with fewer states] systems. 


OPEN PROBLEM 


Can every n-state Markov system be "embedded" in a nontrivial way into a 
cascade type interconnection of systems which have a specific simple form? 
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In other words, is there any theorem which can be proved for Markov 
systems and which parallels in some way the Krohn-Rhodes theorem for the 
deterministic case? 


3. Bibliographical Notes 


Operations such as “Kronecker product" or direct sum for matrices can be 
found in any standard textbook, e.g., Mac Dufee (1964). Section 1 here is based 
on Paz (1966) and Bacon (1964). Decomposition of deterministic machines has 
been dealt with by many authors. An exposition of that theory (including the 
Krohn and Rhodes (1963) theory) can be found in Hartmanis and Stearns 
and Ginzburg (1968). Lumpability for homogeneous Markov chains has been 
dealt with in the book of Kemeny and Snell (1960). Decomposition of 
stochastic automata was first studied by Bacon (1964). The possibility of state 
splitting for stochastic machines was considered first by Fujimoto and Fukao 
(1966). 

Theorem 2.4 and Corollary 2.5 here are based on Paz (1970b). Finally, 
Heller (1967) considered some aspects of decomposition theory for stochastic 
automata from the point of view of the theory of categories and a similar ap- 
proach was undertaken by Depeyrot (1968) who studied various types of 
decompositions, including some interesting particular cases. Additional ref- 
erences: Gelenbe (19692), Kuich and Walk (1966a), Kuich (1966). 


C. WORD-FUNCTIONS 


Let f be a function 

f:X*—R (19) 
where X is a given alphabet and R is the set of real numbers. Functions of the 
form (19) will be called word functions. There are at least three ways to relate 
word functions to Markov chains. First, [see Definition 1.1 in Section I, C] if 
an input-output relation [induced by an SSM] is restricted in a way such that 
the input alphabet X contains a single letter and Y = Z, then the resulting 
function is a word function. 

If f is induced by the SSM A = (S, z, (A(y)]«v. n), then f(v) = n(A) 
with v c Y* = X*, and the matrices A(y) have the property that $3, A(y) is 
stochastic. This case has been dealt with in Section I, C, 1. In the next two 
sections we shall consider two additional ways of relating word functions to 
Markov chains. 
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1. Functions of Markov Chains 


a. Preliminaries 


Let (z, S, A) be a [homogeneous] Markov chain with [finite] state set S initial 
distribution z and transition matrix A. Let X be a partition on S. We shall use 
the following notations: the elements of S [states] are denoted by s, indexed if 
necessary; sequences of states are denoted by u, indexed if necessary; elements 
of X [blocks of the partition] are denoted by c, indexed if necessary; finally, 
sequences of blocks of È are denoted by v, indexed if necessary. 

The chain being discrete, s(t) and c(t) denote the state of the chain and its 
corresponding block at time ¢ and, for u = s,--- s; and v = G, -++ Op, p(u) and 
p(v) denote the probability that s(1) = s,,...,s5(j) = s; and s(1) € a,..., 
s(k) € c, respectively. Let u,u, v, v, be sequences of states and symbols in S 
and X respectively, u, = 5,--- Sn Uz = 5, t> e Sj, V = 04:7: 0,95 = Oyo 
29,. Then p(u,S’u,) and p(v,Z*v,) denote the probabilities that s(1) = s, 
25 S0) — sus(E r1) 5,...,s(iH-r--j—s/ and s(1)e29,..., 
s(k) e On s(k +q 4-1) € oi, ....s(k +q o t) € o, respectively. A Mar- 
kov chain and a function p(v) as above are stationary if p(S'u) = p(uS") = plu) 
and p(X'v) = p(vE') = p(v) respectively, i.e., if the probability of being in a 
specific state at time ¢ is independent of time. Any function p(v) as above with 
domain &* [p(A) = 1, by definition] and range in the interval [0, 1] is called a 
function of a Markov chain and the elements of X are its states. Trivially, a 
Markov chain is stationary if and only if zA = z [if this is the case, then z is 
called a stationary distribution for 4] and a function of a stationary Markov 
chain is stationary. [The converse is, however, not necessarily true.] If B is any 
square matrix of the same order as A, then B,,,, denotes the submatrix of B with 
rows in g, and columns in c;. If č and y are |S|-dimensional row and column 
vectors respectively, then ¢,, and 7,, denote the subvectors corresponding to the 
elements in c; and c, respectively. The symbol 7 will denote as before an |s|-di- 
mensional column vector all the entries of which are equal to one. 

We shall prove now some simple properties of functions of Markov chains. 
If (zt, S, A) is a Markov chain and p is a function of it with state set E, then 


1. pw, X*v,) = or oco, PV 02). 
2. If in particular v, is A, then 


pw, E^) = z 2 P(r, v) = P(r) 
3. Ifv — 9, -+ Ga then 


p) = Ne, Aces aks Ag..0.Me: (20) 
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4. Denote 4,,, Aana *- A, o, = Aoo, [thus Ap ov = Aro Aon) and Ao, Aswe 
= fous) Agvor:Mo: = Mover Then, 


paw, 0,0;) uem Ro, Agno Aone is: = Rowo Movie: (21) 
5. If for some v, p(v) = 0, then for any v', p(vv') = 0. 
Proofs are trivial and left to the reader 


Remark: It follows from formula (20) above that there is a time lag between 
a function p(v) when considered as an input-output relation with single input 
letter [Section I,C] and same function when considered as a function of a Markov 
chain, e.g., p(0,0;) = zA(c0,)A(0;)9g in the first case and p(0,0;) = Ro, 4,,,,f]., 
in the second case. This difference is made clear when single symbols are 
considered, for p(o,) = 2A(o,)y in the first case is the probability of having 
output g, after the process was started and moved into a next state while 
p(o,) = 1,,5,, in the second case is the probability of having output c, to 
begin with, even before the process moved into a new state. This time lag is 
responsible for the differences between the results in the next section and the 
parallel results in Section LC,1. 


b. The Rank of a Function of a Markov Chain 


Definition 1.1: Let p be a function of a Markov chain with state set [of p] X. 
Leto € E, Vtt Up 0, +++ v/ € E*. Then Pj(v,--- Vav +++ v) is the 
k x l matrix [to be called a compound sequence matrix for f} whose ij element 
is p(v,0v;) and r(P,(v, -++ v; vi! +++ oj) is its rank. 


Definition 1.2: Let p and È be as in Definition 1.1. Then, for o € 2, the rank 
of ø [to be denoted by r(a)] is defined as 


r(a)=sup [k—r(P,(v,- - 95 v^ o); JHU, 2... 95, 04 My 0/ € E*] 
k 


Thus r(c) is the maximal rank of a matrix of the form P,(v, +++ v vi +++ vj) 
if such a maximal rank exists; the rank of p [to be denoted by r(p)] is defined 
as the sum of the ranks of its states. 

In the following theorems we shall use some arguments very similar to the 

arguments used in Section I, C. Some results, parallel to results proved in that 
section, will be taken as granted here. The reader is refered to that section for 
details. 
Theorem 1.1: Let p and X be as in Definition 1.1. For ø € X, r(a) < |o| with 
the consequence that r(p) < |S|; where |ø] is the number of states in S belong- 
ing to the block o when X is considered as a partition on the state set S of the 
underlying Markov chain. 


Proof: Any compound sequence matrix P,(v--: Vk; 9i c °° v, is the 
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product of two matrices: a left factor matrix G, whose rows are the vectors 
7, and a right factor matrix H, whose columns are the vectors 4,v,' [see 
(21)]. But the z,,, are |c|-dimensional vectors for any v, and similarly the 7,,, 
are |o|-dimensional vectors. Thus 
r(P (0, +++ 45 V1 +++ my) = rG, H.) < min(r(,), r(H.)) < lal 

and r(p) = Yiecs r(o) < |SI. 
Corollary 1.2: If c € È isa state of p such that |o) = 1 and r(e) #0 [r(o) = 0 
implies that p(vav') = 0 for any v and »' which means that the state can be 
discarded], then z(o) = 1. 

We shall need also the following: 


Proposition 1.3: If o is a state of p such that r(a) = 1, then for any v, v' € E* 
p(vov')p(o) = p(va)p(ov (22) 

Proof: Since r(a) = 1, we have that r(P,(A, v; A, v')) < 1 or 

p(o) plov’) 

plvo) plvov’) 

from which (22) follows immediately. i 


Remark: A function p of a Markov chain is called regular if r(p) = |S|. It 
follows from Proposition 1.3 above that in the degenerate case where S = X 
{the partition on S is trivial], i.e., if a Markov chain itself is considered as a 
function of a Markov chain, this function is regular provided that all its states 
are accessible [r(s) = 0 for all s e S]. 


c. Probabilistic Sequential Functions over &* 


In this section we shall consider probabilistic word functions over E* given 
in some arbitrary way [i.e., not necessarily induced by Markov chains]. By 
“probabilistic sequential functions" we mean word functions f with domain X* 
satisfying the following conditions: 


fa)21 (23) 
2 fwo) = fw), ve x* (24) 
0 « f(v) x: 1, v e z* (25) 


If property (25) is not satisfied but properties (23) and (24) are, then the 
function is called sequential. By “given” functions we mean functions such 
that the values f(v) can be computed effectively [there exists an algorithm for 
computing them] for every » € X*. 

The rank of a (probabilistic) sequential function is defined as in Definition 
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1.2 in Section b above, that definition being independent on the existence of 
an underlying Markov chain for the given function. The following lemma 
parallels Lemma 2.2 in Section I, C. The proof which is similar to the proof 
of that lemma is omitted. 


Lemma 1.4. Let f be a sequential function of finite rank and let P,(v, --- vy; 
v’ +++ vg’) be a given compound sequence matrix of maximal rank for. f and 
o € X. Another compound sequence matrix of the same rank can be derived 
from the given one having the form P,(A, v; --- Üp; A, 9, +++ Tp’). 

Definition 1.3. A [finite] pseudo Markov chain is a system (z, S, A, ij) where 
z, S, and A are as in a Markov chain but z and A are not necessarily stochastic 
and # is an |S|-dimensional arbitrary column vector satisfyiug the equation 
mh = 1. 

For u —5,::: 5, € S* the values p(u) induced by a pseudo Markov chain 
are defined as p(A) = mH = 1 and p(s,--- Se) = As Asis c: A, sj], Where 
Z,» f,, are the s; and s, entries in z and y respectively and A, ,, is the s,s, entry in 
A [p(u) will be sometimes called a pseudoprobability.] 

If X is a partition on the state set of a pseudo Markov chain, then a function 
f over E* with state set È defined by f(A) = 1 and f(o; -+ © Ok) = Ro, Ac 
A,, ,o, Tl, is called a function of a pseudo Markov chain. 

We are now able to prove the following: 

Theorem 1.5: Any [probabilistic] sequential function of finite rank is a function 
of a pseudo Markov chain. 

Proof: By the finite rank assumption and by Lemma 1.4, there exist, for 
each ø € X, regular matrices P,(A, v,, - - - v, As Uo, © * Vao) with k(o)=r(0). 
We shall denote those fixed matrices by P,, and use also the following additional 
notations for c, ô € Zand v e X* 


P,(v) = P(A, Vor esos Vek(a)s v, UU 86 ag Viro): PsA) = P, (26) 


P,{v) = P(r) (27) 

[Note that P,(A) = P, as defined above.] 
P, (0) = P(A, Ver) + + +» Varios V) PaA) = P. (28) 
Posl t) = P (À; v, vt, . . ., vto); Presl) = Pis (29) 


Thus P,,(v) and P,,,(v) are the first column and row of P,,(v) respectively. 
Using a procedure similar to the one used in Section I,C,3, one can prove 
the following relations 
P5,(vav'a") = As, P, (v'0") (30) 
To prove this we consider an arbitrary column, the jth one, in the relation 
(30) which, by (28), has the form 
P, (vov'a'vy;) = As P. VT vj) (31) 
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the ith element in (31) has the form 
f(vsÓvav'o'v' y) = Y a(0vo)f(v,,ov'o' vy) (32) 
and (32) follows from the fact that 
fKav'a'v;) 
f(v,,ov'a'v;,) 


P, 0 (33) 


Sexe) ov'a'vy) 
Slvs,6v0) flvs,dvev,,) +++ f(vs,dv0v'o'vy;) 
Since |P,| = 0, and one can develop the above determinant (33) according to 
its last column and represent the last element in the column as a combination 
of the others. Note that the coefficients of the combination depend on the three 
variables 6, v, and ø only [for fixed i and j] so that they can be denoted by 
a, (óvo). Equation 30 is therefore proved with 4,,, = [a,,(dve)]. 
Consider again Eq. (30) with v = v',o’ = À, and ó' = ø. The resulting equa- 
tion will be 


P(0) = Ass Poo = As, P, or Asa = P4,(0)P;' (34) 


Equation (34) can now be used for computing the matrices A,,. If we set 
in (30) ó' = a, v'o' = A, we get the equation 


P5,(vo) = Apo Ps (35) 
Replacing 6’ = o' in (30) results in 
Ps, (vav'a") = Avo P, (v'o") (36) 


Using (35) in both sides of (36) gives 
Aspav'a’ Po’ = Asso Aavo Po or Asvovo' — Asse Áo (37) 


Equation (37) can now be used for computing the matrices A,, |v| > 2 from 
the matrices A,,. 
Finally, the first column in (35) has the form 
Polvo) = As, Por (38) 
Note that the first entry in P4(vo) is f(óvo). 
Let z, be a k(o)-dimensional row vector of the form z, = (10...). Let z 
be the r(f)-dimensional row vector x = (7,,z,,...z,,) where 6,...90, is 


the sequence of elements of È ordered in some arbitrary but fixed order. Let 
A be the r( f) x r( f) matrix formed from the matrices A,, 


Aao A 


Oik 


PN 
I 


Angi oe 


A 


OkO. 
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Let ij be the r( f)-dimensional column vector 7 = (PT, ..., P%,,)", then, by 
(38) 


75 À5,,7, = (10 --- 0) Aye Po, = (10 --- 0) P4(v0) = f(óvo) 


and zñ = Y, f(c) = 1, so that f is a function of the pseudo Markov chain 
(x, S, A, Ñ) above with |S| — r(f). fj 


Corollary 1.6: Let (x, S, A, 7) bea psuedo Markov chain as derived in Theorem 
1.5 for a given function f of finite rank. Then Añ = ij and if f is stationary 
[ie., 32, Kov) = f(v)] then nA = z. Let G, and H, be the matrices whose 
rows and columns, respectively, are z,_,, and Nov’, Then G, and H, are non- 
singular having the same rank as P, (P, = G,H,). 
Proof: By (34), Ase = Ps-(@) P;!, so that by (29) 2, 45, = Pis(0)P3' and 
sns Asc = ($3; Pus (0)) P;'. But if f is stationary then 
Ys Piso(F) = dA f(óo), f(óov,;), . . . , f(óav' cia) 
== (o). Kova), SANE , Kvarto) = Pico 
Thus Y,72,45, = P,,,P;! = z, and this implies that 7A = z proving the 
second part of the corollary. For the first part we have by (38) that P,(c) 
= Áss Paa = As,1j,. Therefore, 
Ee Asl, = Y, Ps) = Vo (G0), fend), . . . , fsx)" 
= (KÒ), flv), .. .. fF). = fl; 
This implies that Añ = 7. To prove the last part of the corollary, we remark 
that the rows of G, and the columns of H, are r(c)-dimensional and, since 
P,—G,H, and P, is a nonsingular r(c) x r(o) matrix, G, cannot have more 
than r(c) rows and H, cannot have more than r(a) columns and both matrices 
must be nonsingular. This completes the proof. jj 


Consider again Theorem 1.1. It is clear that the theorem remains true if f 
is a function of a pseudo Markov chain. Combining Theorem 1.1 with Theorem 
1.5 results in the following: 


Theorem 1.7: A sequential function f over £* is a function of a pseudo Markov 
chain if and only if it is of finite rank, 


d. Construction of the Underlying Pseudo Markov Chain 


In order to be 'able to construct the underlying pseudo Markov chain for a 
given function of finite rank f, one can use Theorem 1.5 provided that that the 
matrices P, can be found for each ø € X and provided that the values of the 
function f can be computed effectively for the arguments contained in the 
matrices P, and P,,(co) (see (34)). In fact the function f is uniquely determined 
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by its values contained in the matrices P; and P,,(c) only. We shall show in 
this section that the matrices P, [and therefore also P,,(c)] can be determined 
effectively if a bound is given on the rank of f. If it is only known that f is of 
finite rank but no bound is given for its rank, then the actual rank of f cannot 
be determined and the matrices P, cannot be found in general. [See the remarks 
at the end of Section L,C,3.] The matrices P, for the bounded case can be 
found by using the following: 


Theorem 1.8: Let f be a sequential function of rank k and let o € È be one 
of its states whose rank is k(c). A nonsingular matrix P, = [ f(v;ov;')] can be 
found such that /(v,ov) < 2(k — |E) + 1. 


Proof: By Theorem 1.5, f can be represented as a function of a pseudo 
Markov chain (z, S, A, #) with |S| = k. Consider the set of all vectors of the 
form z,, [see (21)]. Those vectors are r(a)-dimensional row vectors and there- 
fore, using a procedure similar to the one used in Section I,B,1 [see Exercise 
5 at the end of that section], one can find a basis for those vectors, 2,,,, .. ., 
Toko Such that /(vjo) < r(o), i = 1,2,..., k(c). Let the matrix whose rows 
are denoted by z,,, be denoted by G, and r(G,) = r(c). Using the same argu- 
ment for vectors of the form 5,,, which are r(c)-dimensional column vectors 
one can find a matrix H, such that r(H,) — r(o), its columns are a basis for 
all the vectors of the form 4,, [thus there are r(c) columns in H,] and any of 
its columns 5,,. has the property that /(cv") < r(a). Consider the matrix G,H,. 
It is an z(o) x r(o) square matrix of rank r(o) [since r(G,) = r(H.) = r(o)] 
and therefore nonsingular. Its entries are of the form a,se = p(v,ov,). 
Thus, G,H, is a matrix satisfying the requirements of a P, matrix and its 
elements p(v,ov;) have the property that /(v,ov;) = v) + vj) + Ka) < 
2r(c) — 1. But r(o) < r( f) — [E| + 1 [since (f) = 35,.:7(0) > r(o) + IZ 
— 1] and therefore I(vav,') < 2(r( f) — |E| + 1) — 1 = 2¢(f) — [ED +1. Bl 
Corollary 1.9: If f is a sequential function of rank k and state set X, then the 
values f(v) with (v) = 2(k — |E| + 1) uniquely determine the function. 


Proof: The matrices P, can be found using only values f(v) with Kv) < 2(k 
— |2|) + 1 and the matrices P;,(c) have entries of the form p(v,,d0v,,;) with 
I(v4)) and /(v;;) smaller or equal to r(Ó) and r(a) respectively. Thus (v, ôov,;) 
< rlo) + r(6) + 2 x: X(k — |Z| + 1). But the matrices P, and P; (o) unique- 
ly determine the function f [see Theorem 1.5] and this completes the 
proof. I 


e. Equivalent Functions 


Definition 1.4: Let Æ = (x, S, A, Ñ) and M’ = (z', S', A’, ij") be two pseudo 
Markov chains and let X and X' be two partitions on S and S’ respectively such 
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that |È] = |Z/|. Æ and .@ are equivalent with respect to X and Z' if there is a 
one-to-one mapping ó : X — Z' such that f(v) = f’(v’) for all v e Z*,v' e Z/*. 
Where f and f" are the functions with state sets X and 2’ respectively induced 
by M and M’; if v = 0, -+ Op then v = 0, --- o, and (o) = oj if v = 
A, then v = 4. 

Theorem 1.10: Let Æ, Z/', X, X' be as in Definition 1.4. Let H be a matrix 
the columns of which are a basis for the set of all vectors of the form 4,, and 
let H be the matrix [E = (v, ...0,]] 


H,, 


0 H,, 
then Æ and æ’ are equivalent with respect to X and X' if there exists an 
|S'| x |S| matrix X and a one to one mapping ¢: X — =’ such that: 


(1) X,, z: 0 only if o’ = ó(o) where X,,, is the submatrix of X with rows 
corresponding to the block o’ € Z' and columns corresponding to the block 
o € X; 2) n’ XH = nH; (3) XAH = A'XH; (4) i! = XÑ. 


Proof: (1) and (2) imply that 


(5) WyXooH, — n H, 
(1) and (3) imply that 

(6) Xoo ASH; = A vp Xp Ha 
(1) and (4) imply that 

(7) Ns = Xysifs 


Now f(0,...0,) = Tt, flou, = Anoe ci Aono Using (5) and ob- 
serving that j7,,...,, is a linear combination of the columns of H,, we have 


LU P = Toy QUAS L P = Ln X QUE. P Moron 
Using (6) and observing that #,,...., is a linear combination of the columns of 
H,, and repeating as many times as necessary we have 
Ke; DEDE Ox) = Toy Ares ee Agno’ Xoo 
= p M [olet elo. = f'ar SN c,) 
by (7). The proof is complete. 
Remark: Theorem 1.10 provides us with a sufficient condition for equivalence 
of two functions of different pseudo Markov chains. In fact, one can prove 


[see Theorem 1.12 below] that the conditions of the above theorem are also 
necessary if the chain .@ with partition È over its state set resulted from a 
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construction as in the proof of Theorem 1.5. If the matrix H in the conditions 
(2) and (3) of Theorem 1.10 is ignored, then a weaker set of sufficient con- 
ditions for equivalence results. These weaker conditions are summarized in 
the following: 


Corollary 1.11: Let Æ, æ’, Z, Y' be as in Definition 1.4. Æ and æ’ are 
equivalent with respect to X and Y’ if there exists an |S’| x |S] matrix and a 
one-to-one mapping g: £ — X' such that: (1) X,- 4 0 only if o’ = g(a); (2) 
z'X =n; (3) XA = A'X; (4) f = X8. 


Consider the following problem: Given a probabilistic sequential function f 
of finite rank k, is this function representable as a function of a [true] Markov 
chain? If yes, then find an underlying Markov chain. 

Using Theorem 1.5 we can find an underlying pseudo Markov chain such 
that f is a function of it over some state set. We can try now to use Theorem 
1.10 or Corollary 1.11 replacing æ or æ’ by the psuedo Markov chain above 
and trying to find another true Markov chain which will satisfy the require- 
ments of the theorem or its corollary. 

Let conditions (1)-(4) of Theorem 1.10 be considered as equations, with æ’ 
replaced by the pseudo Markov chain found by using Theorem 1.5, and Æ, 
X, and 9 variables. If the given function is a function of a true Markov 
chain, then a solution to those equations must exists with A, z stochastic and 7 
having all its entries equal to one. This follows from the following: 


Theorem 1.12: Let H, /', X, Y' be as in Definition 1.4. If æ and æ’ are 
equivalent with respect to Z and Z' and .@’ isa pseudo Markov chain derived 
as in Theorem 1.5, then æ and æ’ satisfy the conditions (1)-(4) of Theorem 
1.10 for some matrix X. 


Proof: By (34) [see proof of Theorem 1.5] P;,(¢) = 45, P, where As, are 
submatrices of the matrix 4’. As æ and æ’ are equivalent we have also that 
P, = [f(v,,0v,)] = [f (v,,0v,))] = P; [f and f denote the functions corre- 
sponding to Æ and æ’ respectively] so that P, = G,H, where G, and H, 
are as in Corollary 1.6 and P,,(c) = G;A;,H,. Thus G,4,,H, = P4(o) = 

bo P, = As,G,H,. Let H be a matrix as in the formulation of Theorem 1.10 
and let G be a matrix constructed in the same way from the matrices G,. Then 
the above equation implies that GAH = A’GH and this is condition (3) in 
Theorem 1.10 with G replacing X and satisfying (1) in that theorem. 

Now 

Ne H, = 7. flos, E Novato) = (Kova) in S(O 10) 
By the construction of z' in Theorem 1.5, z,’ is a vector with first entry equal 
to one, all the other entries being equal to zero. Therefore, z,/'G, H; = x, P; 
—(f'(0av,) ... f'(0v,, 0). Since f = f', nH = n'GH verifying (2) of Theorem 
1.10. 
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Finally, 4,’ = (f'(o) . . . f'(v,4,,0)' by the definition of y,’ and 


Re 


Gon. =, ; Hn. = (Ko) C feito)" 
m 


Voklo)? 


and this completes the proof for f = f". | 

It follows from Theorem 1.12 that if no solution exists to the conditions of 
Theorem 1.10 considered as equations, with .//' the pseudo Markov chain 
derived as in Theorem 1.5 for the given function and Æ a variable true 
Markov chain, then the given function is not a function of a true Markov 
chain. Conversely, any solution to the above four equations with the given 
restrictions provides an underlying Markov chain for the given function. Un- 
fortunately the use of Theorem 1.10 with its conditions considered as equations, 
as above, is not practical in general for there are too many free parameters 
involved [the rank of .#', the matrix X, the partition X, the matrix H, etc.]. 
On the other hand if the roles of @ and .@' are interchanged, i.e., the con- 
ditions of Theorem 1.10 are considered as equations; with Æ known and 
derived as in Theorem 1.5 and æ’ an unknown Markov chain then The- 
orem 1.10 is equivalent to its Corollary 1.11. This follows from the fact that 
in this case H is a nonsingular matrix for every ø € X [see Corollary 1.6] so 
that H is nonsingular and can therefore be deleted from both sides of conditions 
(2) and (3) in that theorem. These considerations together with Corollary 1.11 
and Theorem 1.12 lead also to the following: 


Theorem 1.13: Let f be a probabilistic sequential function of rank k. Let M 
be the pseudo Markov chain with partition È over its states such that f is its 
function as found in Theorem 1.5. f is a function of a true Markov chain with 
k states if and only if the conditions of Corollary 1.11, when considered as 
equations with & = M or M' = M the other chain involved being variable, 
admit a solution such that the matrix X is nonsingular. 

Proof is left to the reader. 


f. Examples 


We conclude this subsection with some examples in which we shall make use 
of Corollary 1.11 to solve some particular cases. 


Example 16: Let f be a probabilistic sequantial function of rank k and let 
M = (n, S, A, ij) be some underlying pseudo Markov chain with partition X 
over S, as derived in Theorem 1.5. If all entries in A are nonnegative, then f 
is a function of a true Markov chain with k states. 
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Proof: We remark first that all the entries in 7 are positive, for they have 
the form f(v,,0) and if for some i and o, f(v;,o) = 0 then P, has its ith row 
equal to zero [the entries in the ith row of P are of the form f(v,,ov;)] which 
is impossible since P, is nonsingular. Let now X, be a square diagonal matrix 
with ith diagonal entry equal to ( f(v,,0)) ' and let X be the [nonsingular] 
matrix 


0 
X= . 

0 Xa 
Then X7 = n’, n’ is a vector all the entries of which are equal to one. Let A’ 
be the matrix A’ = X AX^! and let 2’ = zX^!. Then all the entries in z^ and 
A’ are nonnegative; since, by construction, X and X~' are diagonal and non- 
negative, x is nonnegative and so is A by assumption. Furthermore, z'5J' 
—Zz Xg-mnj-1 [ny =1 by the definition in Theorem 1.5] and A'n’ 
= A X] = X An = Xñ = q' [Ah = ñ by Corollary 1.6] and therefore z' and 
A’ are Markov matrices [since y’ is a vector all the entries of which are equal 
to one]. It follows from Corollary 1.11 that the pseudo Markov chain ./ with 
partition È is equivalent to the true Markov chain æ’ = (z', S, A’, n') with 
same partition X. | 


Example 17: Let f, æ, and X be as in Example 16. Let J£, be the set of 
all vectors of the form ñ., and let 2£^,* be the set of all r(c)-dimensional row 
vectors z^ such that 2° ñ., > 0 for any 27, € Ha. If, for every o € X, #,* 
contains a finite set of vectors 75,7 - - - z2,, such that every very vector in #,* 
can be expressed as a nonnegative combination of them, then f is a function of 
a true Markov chain. 


Proof: We remark first that if 7,7, .. . ,7,; is a set of vectors satisfying the 
condition stated above then also the set k,7,7, . . . , ky.) AZo satisfies that con- 
dition with k,,..., k«e) an arbitrary sequence of positive constants. 

Let X, be the (c) x r(c) matrix with ith row equal to k,n; where the 
k,,s are positive constants chosen so as to have k,,z;75, = 1. To prove that 
such a choice is possible, we must prove that 2,74, — 0. Indeed, 2,°”, > 0 by 
assumption and if z,75, = 0, then 0 = 2,79, = z^ 054.5 which would imply 
that 2,°”,; = 0 [for 2: Nse 20] and by induction 7; Ne, = 0 for any element 
les We would have in particular that z; H, = 0 where H, is the nonsingular 
matrix as defined in Corollary 1.6 this implying that z, = 0. It follows that 
a matrix X, as above can be constructed and X5, = y,' where y,’ isa column 
vector with all its entries equal to one. Consider now a vector of the form 
kaini Ajs This vector belongs to Hst, for kan: Aim, = Koin Nosy > 9, 
and can therefore be expressed as a nonnegative combination of the vectors 
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Ld 


kan. One can thus construct a matrix 4/,, such that X,A,; = 4/,,X; where 
the ith row of Aj, is the vector of the coefficients of the nonnegative com- 
bination of the rows of X, corresponding to the row k,n; A, in the left-hand 
side of the equation. Finally, z, is in 2£^,* [for z,5,, > 0] and therefore can 
be expressed as a nonnegative combination of the rows of X, in the form 
T, = mn, X, with z,' nonnegative. 

Let X be the matrix with diagonal blocks X,, the other entries being zero; 
let A’ be the matrix whose có blocks are As; let z' = (2,,...7,,) and 
97 = (n2 --- NAY with X = {o,...0,}. It follows from Corollary 1.11 that 
the resulting chain is equivalent to the given one with respect to X. But the 
resulting chain is Markovian since y’ has alread the required properties (all its 
entries are equal to one), z', and A’ are nonnegative with xy’ = z'X5 = ny 
and A’y! = A'Xg = X An = Xt = 7’ so that z and A’ are stochastic. | 


Example 18: Let @ = (x, S, A, n) be the pseudo Markov chain with 


05 0 0 0 0.5 
0 —04 0 0 1.4 
A=|0 0 0.5 0 0.5 
0 0 0 —0.3 13 


0.25 0.084 0.25 —0.078 0.494 


S = {s,,52,..., 8s}, 2 = (0.25 0.03 0.25 —0.03 0.5) and 5 — (1 1 1 1 17. 
Let X be the partition X = {{s,5,}, {5554}, {s} = (e, B, y} over S. We show first 
that the resulting function f is a function of a true Markov chain. To prove 
this fact we use Corollary 1.11 and the argument used in the previous example. 
Let X be the (regular) matrix 


0.7 03 0 0 0 
1.55 —0.55 0 0 0 
X-|0 0 0.7 03 0 
0 0 15 —05 1 
0 0 0 0 1 


Then Xy = y and one verifies easily that the equations z = z'X and XA = A'X 
can be solved for z’ and A’ with nonnegative entries [the reader is urged to 
complete the computations]. It follows that the chain æ’ = (z', S, A’, n) is 
Markovian [the same argument used in the previous example will prove this]. 
If the partition X is as before [for the given .@ chain], then the resulting func- 
tion f" [identical to the given one] is a function of a true Markov chain. 
Consider now the partition È = ((s,5,5554], {ss}} = {6, y} over S. The induced 
function f is again a function of a true Markov chain the .//' chain: for the 
blocks of € can be constructed by merging blocks of X. [This implies that 
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Corollary 1.11 can be used for the partition E with same matrix X and the 
resulting underlying chain æ’ will be the same as before.] To compute the 
actual values of f one can use a 4-state pseudo Markov chain. M = (#, S, A, ij) 
instead of the given one æ with .// derived from æ by merging the state s, 
and s,, i.e., zt = (0.5 0.03 —0.03 0.5), 


05 0 0 0.5 
i 0  —04 0 1.4 
~ 10 0 —0.3 1.3 


0.5 0.084 —0.078 0.494 


S = {5,5,5,5,., 7 = (1 11 1)" and the partition X' will be XL’ = {(5,5,5,}, {5,3 
= {6, y). The function f induced by M with partition È is equal to the function 
induced by .@ with partition È’ because the states s, and s, have the same 
distribution in .// [the pseudo probability of a sequence of states is not changed 
if the state s, is replaced by state s, or vice versa in the sequence], and they 
are both in the same block of X. 

We find now r( f). Clearly, r( f) = r(6) + r(y) = r(6) + 1 for y is a single 
state block [see Corollary 1.2], and by Theorem 1.1, r(6) < |ô] = 3. To find 
the actual value of r(6) we compute the values f(ó") for n = 1,2,...5: 


(0.5)! 0 0 1 
f(ó) = (0.5 0.03 —0.03)| 0 (—0.4)"! 0 1 
0 0 (—0.3)71| 1 


= 0.5(0.5)'^! + 0.03(—0.4)'*! — 0.03(—0.3)! 
= 0.5(0.5)'*! + 0.3((—0.4)! — (—0.3)'-') 
resulting in 
fK = 0.5, 0.247, 0.1271, 0.6139 and 0.031775 
respectively for i = 1,2...5. Let P; be the compound sequence matrix based 
on the sequences v, = vj! = A, v; = v! = 6 v, = v! = 6, then 
0.5 0.247 0.1271 
P,—|0.247 0.1271 0.06139 
0.1271 0.06139 0.031775 
which can easily shown to be nonsingular. Thus, r(6) = 3 and r( f) = 4. 

We shall complete this example by showing that f, although a function of a 
5-state true Markov chain is not a function of a true 4-state markov chain. To 
prove this, we use Theorem 1.13. If f is a function of a true 4-state Markov 
chain, then by Theorem 1.13 [r( f) = 4 as proved above and.@ is a four state 


pseudo Markov chain] we would have that YA = A'X for some stochastic A’ 
and X nonsingular of the form 
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y= P 0 | 
0 X, 
Thus, X; Ass = Ass X; or X,45,X5! = Ass. Ass and 45, being similar matrices, 
their traces must be equal. But the trace of As, is negative by the definition 
of A and the trace of 45, cannot be negative for A’ is assumed to be stochastic. 


Thus f cannot be a function of a 4-state Markov chain and the proof is 
complete. 


Remark: Example 18 shows that functions of true Markov chains may exist 
such that the number of states of the underlying Markov chain is strictly bigger 
than the rank of the corresponding function. One may ask now whether there 
exist functions of finite rank which are not representable as a function of a true 
Markov chain. Fox (1967) and Dharmadhikari (1967) [see also Heller (1965)] 
showed, by examples, that the answer to the above question is positive. The 
examples of Fox and Dharmardhikari are too involved to be reproduced 
here, moreover, their proofs seem to be incomplete. For additional aspects of 
functions of Markov chains, the reader is referred to the following exercises 
and the bibliographical notes which follow. 


EXERCISES 

1. Prove the properties (1)-(5) of a function of a Markov chain given in Sub- 
section 1, a. 

2. Prove Lemma 1.4. 


3. Let 0 ca, < a +++ «a, 441 < l be a sequence of numbers and define 
the n x n matrix M = [m;;] as follows 


—0 iflcüjzn-—k-rlandizj 

= a, if l<ij<n—k+4+1 andi=j 
my = (1 — aj)/(k — 1) ifl<j<n—k+1<i<n 

= (1 — a)/(k — 1) ifl<i<n—k+1<j<n 


= Q(k—1)—n £N a)Kk—1P ifn—-k+1<ij<n 
Let M = (S, zt, M) be a Markov chain with |S| = n, z an n-dimensional 
vector all the entries of which are equal to 4 and M is as above. Finally, let 
X be the partition E = fs; . . 5, i (5,4)... {S.J} Prove that the function 
f of the Markov chain æ with partition È over S is such that r( f)= nand 
compute the value of the determinant of P,(v, . . . Un-x+13 v, ... V, gui), Where 
9, =v; = A, v, = v =o" for i > 1 and ø is the first block in È}, for f. 


4. Let f be a function of a Markov chain with state set 2 = (c;). Let f be the 
function f(v) = f(v) where for any v = 9,0, +++ 04; Č = 0, ttt 0,. Prove 
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that if the underlying Markov chain for f is stationary and its initial distribu- 
tion has only positive values, then f is a function of a Markov chain. 


5. Let f be a function of a pseudo Markov chain of finite rank, with state set 
X = (c, ô} such that r(6) = 1. Then á compound sequence matrix P, [see (27)] 
of maximal rank for f can be chosen such that all the entries in P, have one 
of the forms, f(óo* ô), f(da*), f(a* 6), or f(c*). 


6. Let f be a function of a true Markov chain of finite rank with state set 
I = {ø}. Prove that any matrix P,, [see (27)] of maximal rank for f can be 
expressed as a finite sum of nonnegative matrices of rank 1. 


7. Let f be a sequential function of finite rank and let Æ = (a, S, A, Ñ) be a 
pseudo Markov chain as derived in Theorem 1.5 for f. Prove that another 
pseudo Markov chain æ’ for f can be found such that Æ’ = (z'', S, A’, i) 
with 3 = (1, 1,..., 1), AÑ = 7’ and zi = 1 [ie., the vector z' and the 
matrix A’ are “pseudo stochastic" with row sums equal to one]. 


8. Let M, M', I, X' be as in Definition 1.4 and assume that Æ and æ’ are 
equivalent with respect to X and Z'. In addition assume that |S] = rank f 
where f is the function induced by æ (or æ’) with partition X. Prove that 
there exist two matrices B, C such that B is |S] x |S'| C is |S'| x |S|,B-C =I 
where I is the |S| x |S] unit matrix and A = BA'C. 


9. Let f be a probabilistic sequential function over the state set X = {ø} 
such that r(o;) < 2 for every v; € X. Then f is a function of a true Markov 
chain. 


10*. Let f be a function of the Markov chain Æ = (z, S, A, n) with state set 
x = {a3}. 

Prove the following relations: 

a. P,(X*7!o, X^ E "Op 3^7!g, ) m (4), (4 Dore," FH [A Taceo 
[See (28) and other definitions in Section 1,a] 

b. f(oX^^!o, Z^-! +. g, X710) = n,(4*),,, ++ (Aa aNs 

c. f(vX^v') converges as n — co for every v and v’ if and only if f(v,,¢"60;,) 
converges as n — oo for every o, 0 € X, and every i and j. 

d. f(oX^v) > f(v)f(v') as n — co for every v and v' if and only if f(v o2" ôv) 
— f(v,,o)f(Óv;;) for every c, ô, i, and j 

e. A” converges as n — oo if and only if f(vX^v') converges as n — co for 
every v and v’ 


11*. A sequential probabilistic function f, f: X* — [0, 1], is termed “mixing” 
if f(vZ"v') — f(v)f(v') as n — co for every v and v’. Prove the following: 


Theorem: Let f be a sequential probabilistic function of finite rank and mix- 
ing. Let g,, be the function derived from f by the definition 


gw(0;, ^: 04) = A(O, E" 0, E" +++ o, E") 
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Then there exists an integer m* such that g,, is a function of a true Markov 
chain for any m > m*. I 


11. Let f, æ, and X be as in Example 16. Prove that f is a function of a 
true Markov chain if the following condition holds true. For any o € F, a 
finite set of r(c)-dimensional row vectors 7,” - - - AZ) can be found such that 


1. z^39,5 0, i— L2,...,k(o),o e X. 

2. n," Á,, can be expressed as a nonnegative combination of the vectors z 
for every o and 6 € X. 

3. x, can be expressed as a nonnegative combination of the vectors x,” for 
every o € X. 


12. Find the true Markov chain equivalent to the pseudo Markov chain in Ex- 
ample 18 with the partition X and the matrix X as given in that example. 


OPEN PROBLEMS 


1, Find an algorithm for ascertaining whether a given probabilistic sequential 
function of rank K is a function of a true Markov chain. 


2. Find an algorithm for ascertaining whether a given function of a true 
Markov chain of rank k has an underlying true Markov chain with only k 
states. 


3. Provided that the conditions given in Example 17 or in Exercise 11 above 
are known to hold true for a given function f [e.g., this would be the case if 
r(o) < 2 for any state ø of f—see Exercise 9 above] give an algorithm for find- 
ing the actual underlying true Markov chain. 


Bibliographical Notes 


Functions of Markov chains where first studied by Blackwell and Koopmans 
(1957). [See also the work of Harris (1955) who considered a related problem.] 
Gilbert (1959) proved some of their basic properties The subject has been in- 
vestigated afterwards by several authors: Fox (1959); Fox and Rubin (1965, 
1967), who were able to use the theory for estimating the temporal be- 
havior of cloud cover (based on statistical data taken in the Boston area they 
proved that the stochastic process involved can be represented as a function of 
a Markov chain but not as a Markov chain); Dharmadhikari (1963a, b, 
1965, 1967) considered various aspects of the problem and gave some suffici- 
ent conditions for a sequential function to be a function of a Markov chain; 
Carlyle (1967) considered a special case. Finally some new additions to the 
theory have been achieved by Heller (1965) and Depeyrot (1968). The section 
presented here is based mainly on the work of Gilbert (1959) with additions 
and some of the exercises based on the subsequent work. Thus Gilbert is to be 
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credited for the basic ideas underlying the theorems and corollaries 1.1-to 1.9, 
with some clarifications by Dharmadhikari who is to be credited also with the 
Examples 17,18 and Exercises 9 and 10. Theorem 1.10 and its corollary 1.11 
is new. Theorems 1.12 and 1.13 are a generalization of a theorem of Gilbert 
who is to be credited also with Exercise 3. Exercises 5 and 6 are due to Fox. 
Finally Exercise 11 is similar to a theorem of Heller. Additional reference: 
Burke and Rosenblatt (1958). 


2. Function Induced by Valued Markov Systems 


A theory of input-output relations was developed in Section I, C. In the light 
of that theory, functions of Markov chains can be considered as output rela- 
tions, since, if f(v) is such a function, then the value f(v) can be interpreted as 
the probability that the word v is the output of a given Markov chain. In this 
section we shall develop a theory of word functions which can be considered 
as input relations derived from nonhomogeneous Markov chains. 


a. Valued Markov Systems 


Definition 2.1: A valued Markov system is a 4-tuple (z, S, {A(o)}, {M}: z) where 
(S, {A(o)}) is a Markov system, z is a probabilistic vector of dimension |S| and. 
(nj is a finite set of |S|-dimensional arbitrary column vectors [the entries in y; 
are arbitrary real numbers]. With every i € Z the function f, over £*, induced 
by the valued Markov system is defined as f(u) = 2A(u)y, with u = 6; --- oO, 
€ X* and A(u) = A(o,) --- A(o,). f(A) = an, by definition. The functions 
J(u) will be called input (word) functions. 

The values of the input functions f(u) can be interpreted as expectations or 
costs, since, denoting by ,, the jth entry in y; we have that f(u) = Ez,(u)m, 
and z(u) is a probability [the probability that the Markov system when started 
with distribution z will end scanning the word u in state j}. If the values y, 
are either 0 or 1 the f(u) can be interpreted as a probability [the probability 
that the system when started with distribution z will end scanning the word u 
in one of the states s, such that 5, = 1]. 

Input functions f;(u) induced by valued Markov system differ from the func- 
tions considered in the previous section in that they do not satisfy necessarily 
the relations (23) and (24). In addition, although input functions can be in- 
duced by input-output relations, the correspondence is not always one to one. 
Thus, let A be an SSM, given in the Moore form, A = (S, X, Y, (A(X)], A) 
[see Definition 2.1 in Chapter I] with initial distribution z. Define the valued 
Markov system (S, z, {A(x)}, (5,],.,) with X = E, Z = Y and ,, = 1 if and 
only if A(s;) = y. It is easily seen that the input functions f, induced by the 
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valued Markov system can be defined in terms of the input-output relation in- 
duced by the SSM with 


fu) = 2 P,(vy|u), Ku) > 1. 


1G)- 100-1 
On the other hand, it may happen that two nonequivalent input output rela- 
tions induce the same input function. This is shown by the following. 

Example 19: Let A = (S, X, Y, {A(x)}, A) and A’ = (S, X, Y, (4'(x)), A) 
be two SSM [with common S, X, Y, and A] such that X = (0, 1}, Y = (a, b}, 
S = {s,, S2, 85, 54}, A(s,) = A(s;) =a, A(ss) = A(s,) = b, 


$002; 

A) = 40 -—|4 0 0 1 

$004 

$00 3 
10014 1000 
412|02 40|  4()02|0 00 1 
4001 1000 
0110 0001 


and some initial distribution z = (4 0 0 3). The two resulting input-output re- 
lations are not equivalent, e.g., p4(ab|11) = 4, but p*(abj11) = 0. On the other 
hand, for any v, y, and u one finds easily that 


»» p,vy|u) = nA(u)m, = + = nA'(u)n, = Lp, (oyu) 


so that the resulting input function is the same. 

The above considerations show that input-word functions cannot generally 
be reduced, in a unique way, to other type of word functions discussed before 
and therefore a specific theory will be developed for them. On the other hand 
many of the properties of input functions are similar to properties of the other 
types of functions, and so are many proofs to related theorems. In all such cases 
we shall omit those proofs, leaving them to the reader. 


b. Generalized Events and Their Rank 


In deterministic automata theory an event E is understood to be a subset of 
the set of all words over a given alphabet. Such an event can be represented 
by its characteristic function f,[f,(u) = 1 if u € E and f,(u) = 0 otherwise]. 
We shall agree to term such an event as a 0-1-event the term being a name for 
both the event and its characteristic function with values either zero or one. 
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Extending this terminology, any word function will be called an event and a 
set of word functions will be called a generalized event. 


Definition 2.2: Let E, = (f,],.; be a generalized event, and let u,...,u,, 
u.s, Wu € X*Sn,...,n€ Z. Then P(u,... , ug (ui n)... (uín))isthe 
k x I matrix (to be called a compound sequence matrix] whose ij element is 
(fu (u;u,)) and its rank is denoted by r(P(u;, . . . , up; (uin), .... (u/n)). 


Definition 2.3: Let E, be as in Definition 2.2, then r(E,) (the rank of the gen- 
eralized event E,) is defined as 


r(E,) = "P (k = r(P(u,..., ug (un), ..., (uj n): 


j—h2,...;u,...,Up U... uy € ES n...,n e Xj 


[Thus r(£,) is the maximal rank of matrix of the form P(u,, ... , up (Win), ..., 
(u;'nj)) if such a maximal rank exists.] 


Theorem 2.1; Let E, be a generalized event induced by a valued (pseudo) 
Markov system with |S] states. Then r(E,) < |S]. [As before the prefix "pseudo" 
means that the vector z and the matrices A(c) are not required to be 
stochastic.] 


Proof: Under the conditions of the theorem, every matrix of the form 

P(u, ..., ug (un),... , (u/nj))can be expressed as a product of two matrices: 
a left factor matrix G whose rows are |S| dimensional vectors of the form z(u;) 
and a right factor H whose columns are |S|-dimensional vectors of the form 
9, (u,). 
Lemma 2.2: Let E, be a generalized event of finite rank and let P(u,,...,uj 
(uni), ..., (ujnj)) be a given compound sequence matrix of maximal rank 
for it. Another compound sequence matrix of the same rank can be derived 
from the given one and having the form 


P(A, th, icra uj; (u,'n,), EET] (uj nj) 


Proof: Same as the proof of Lemma 2.2 in Section I, C and left to the 
reader. 


Theorem 2. 3: Let E, be a generalized event of finite rank. Then there exists 
a valued pseudo Markov system A such that E, is identical with the set of in- 
put functions induced by A. 


Proof: [Some of the details in the proof, being similar to the corresponding 
parts in the proof of Theorem 1.5, will be omitted.] Let P(4,15,... , 1; 
(u, nj)... , (44 n,)) be a compound sequence matrix of maximal rank for E, 
[such a matrix exists by the finite rank assumption and by Lemma 2.2]. We 
shall denote this matrix by P. Consider the following determinant 
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! fale!) 
P ise =0 
I 
ACA (39) 


The determinant being of order k + 1 is equal to zero for any variables u, u’ 
€ E* and n € X [all the other factors appearing in the determinant i.e., 
Uj io Uks Uy',..., Ug, T... , Ng being constant and u, = A.] 

Developing the determinant according to its last column and dividing by |P| 
we have 

k 
f, (uw) = 2 a,(u)f, (uu) (40) 

where the values a,(u) are the resulting coefficients depending on u only. Re- 
placing u by uu, w by wu; and n by n; [u, u’, and n are variables in (40)] we 
have 


fu (uuu u,') = » ay (uiu) f (uu uj) (41) 
or in matrix form 
P(uu') = A(u)P(u') (42) 
where we have used the definitions 
P(A)=P and P(u) = P(ÀA,..., uy; (uu n), ... , (uu, Ny) 


and A(u) being the matrix of corresponding coefficients. Thus, 


P(o)= A(c)P or Alo) = P(o)P"! (43) 
and combining (42) and (43) we have 
A(aa') = A(a) Aa’), A(4) = 1 (44) 
Consider again (40) and replace u by u,u and n by j. We have 
f; (uuu) = x a(u;u) f, (uu) (45) 


Let 4;(u) be the column vector defined by 4;(u) = (f;(u,u), . . . , f)(u,u))", then 
comparing (45) with (41), we can write (45) in the following matrix form 


1, (uu) = A(u)n, (wu), uwvckE* icz (46) 
But u, = A so that 7,(A) = (f,(A), . .., f;(u,))” and by (46), we have that 
nu) = Aun A) = (fy), . . . . fiu y (47) 


Define now the valued pseudo Markov system A = (x, S, {A(o)}, {4;}) where 
[S| = k; A(o) are the matrices as defined in (43); z is the k-dimensional vector 
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x = (10 --- 0) and 4, are the k-dimensional column vectors defined by (47) 
with u = A. 
We have, for u = 6, --- Cm (using (44) and (47)) that 


NAC), ..., Am) = (100 --- 0)4(o,) -- - AO mi = (10 ++ - 0) 40m; 
=(10---On@=f@ ff 


Combining Theorems 2.1 and 2.3 we have the following: 


Theorem 2.4: An event E, can be represented as the set of input functions in- 
duced by a valued pseudo Markov system A if and only if E, has finite rank. 

The proofs of the following corollaries and theorems are similar to corre- 
sponding proofs in the previous Section ! and are left to the reader. 


Corollary 2.5: Let A be a valued pseudo Markov system as constructed in the 

proof of Theorem 2.3 for a given generalized event E, of finite rank. Let G be 

the matrix whose rows are 2(u,) and let H be the matrix whose columns are the 

vectors 4,,(u,’) with P = GH. Then G and H are nonsingular. [The words 

wu, uj and the 4, are the fixed words in the proof with P(u,,..., us; (u'n), 
.., (u4  n,)) nonsingular.] 


Theorem 2.6: Let E, be a generalized event of rank k. Then a nonsingular P 
matrix as in the proof of Theorem 2.3 can be found such that P = [f,,(u,u,’)] 
and ((uuj) < 2k — 2, i, j= 1,2,...,k 


Corollary 2.7: Let E, be a generalized event of rank k, then the values f;(u) 
with /(u) < 2k — 1 uniquely determine the whole event. 

It follows from Theorem 2.6 and its Corollary 2.7 that if, and only if, a given 
generalized event is known to be of finite rank and a bound is given for its rank, 
then an underlying valued pseudo Markov system can be constructed effec- 
tively. 


c. A Necessary Condition for Representability 


The following theorem provides a useful necessary condition for a given gen- 
eralized event to be representable as a set of output functions of a valued 
(pseudo) Markov system. 


Theorem 2.8. Let E, = (f/],.z be a generalized event such that it can be rep- 
resented as a set of input functions of a valued pseudo Markov system. Then 


for every i € Z and u € E*, there exists a set of numbers c, ...,c, , such 
that for every w’, u” € X* the following equality holds 
fi(u uu") = e, s fiw uw") + +++ + eofi(u' u") (48) 


If an underlying valued system can be found such that it is true Markov then 
Cote tees +1 = 1 (49) 
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Proof: Let A = (x, S, {A(o)}, (53) be an underlying system for E, The 
matrix A(u) satisfies its minimal polynomial so that there exists numbers 
b, ..., b, such that bI + --- + b,[A(0)]* = 0. But [4(0)]* = A(u*) so that 
the equation above can be put in the following form [after dividing by b, and 
transferring the last term to the right-hand side]. 


Col + +++ + eua AQ) = Au) 
Multiplying each term in the equation by 2A(u’) to the left and by A(u'’)y, to 
the right we have 


co fi (u u^) + -> + eua f i utu) = filu) 
If the system A is true Markov, then A(u) is a Markov matrix so that one of its 
eigenvalues is equal to one. Inserting this eigenvalue into the minimal poly- 
nomial we have 


ba + -e + bk- = —h or Cot e teal | 


d. Equivalent Valued Markov Systems 


Definition 2.4: Let A = (z, S, {A(o)}, {3;<z) and A’ = (z', S’, {A'(o)}, 
(5; ];. z’) be two valued (pseudo) Markov systems over the same alphabet È. A 
is equivalent to A’ if there is a one to one mapping $: Z — Z' such that f/^(u) 
= fft, (u) for every u € E*. 

Given a valued [pseudo] Markov system A one can construct effectively [us- 
ing a procedure similar to the one used in Section [I, B, 1], two matrices G and 
H such that G has linearly independent rows of the form z(u) = zA(u), and 
any row vector of the form z(u) is a linear combination of the rows of G; H 
has linearly independent columns of form y,(u) = A(u)y,, and any column vec- 
tor of the form 7,(u) is a linear combination of the columns of H. 

Using the above notations we can prove now the following: 


Theorem 2.9: Two valued pseudo Markov systems A and A’ as in Definition 
2.4 are equivalent if there exists a matrix X of due dimensions and a mapping 
9: Z — Z' such that (1) z/XH = nH; (2) X (o)H = A'(c)XH for every o € px 
(3) t, = Xn, for every i € Z. 

Proof: 'The proof is left to the reader. [The method used in the proof of 
Theorem 1.10, with due changes to meet the different definitions, will do.] Jj 


Corollary 2.10: Two valued pseudo Markov systems A and A’ as in Definition 
2.4 are equivalent if there exist a matrix X of due dimensions and a mapping 
à: Z — Z' such that (1) z' X = z; (2) XA(o) = A'(o)X for every o € X; (3) 
Noa = Xn: for every i € Z. 

Definition 2.5: A valued pseudo Markov system 4 is minimal if the rank of its 
induced generalized event equals the number of its states. 
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Lemma 2.11: Let 4 be a minimal valued pseudo Markov system. Then any 
G4 and H^ matrix for A are nonsingular. 


Proof: G^H^ is a compound sequence matrix for A of maximal rank because 
any other compound sequence matrix fer A can be written in the form G’ H” 
where the rows of G' and the columns of H' are linear combinations of the 
rows of G4 and the columns of H^ correspondingly. If follows that min(r(G^), 
r(H^)) > r(G^ H^) = |S]. But G^ has |S| columns and H^ has |S| rows and 
therefore r(G^) < |S|, r(H^) < |S]. Thus, r(G4) = r(H^) = |S| and both 
matrices are nonsingular. | 


Theorem 2.12: Let A be a valued pseudo Markov system with E,^ the corre- 
sponding generalized event. Let A’ be the minimal valued pseudo Markov sys- 
tem as constructed in Theorem 2.3 for the given E,^ [by definition A and A’ 
are equivalent]. Then a matrix X exists such that (1) nH4 = n'XH4, (2) 
XA(0)H^ = A! (0)XH^; (3) n! = XT. 


The proof which is similar to the proof of Theorem 1.12 [with due changes 
to meet the different definitions] is left to the reader. 


Corollary 2.13: If the system Ain Theorem 2.12 is minimal, then there exist a 
nonsingular matrix X such that the necessary conditions of Theorem 2.12 can 
be replaced by the following; (1) z = z' X; (2) XA(o) = A'(a)X; (3) n! = Xn. 


Proof: If A is minimal, then H^ is nonsingular [see Lemma 2.11] and can be 
reduced in the conditions of Theorem 2.12. Furthermore, one can assume that 
the matrix X in the proof of Theorem 2.12 is a G matrix for A [see the proof 
of Theorem 1.12] which by Lemma 2.11 is nonsingular in this case. | 


Corollary 2.14: Let A and A" be two equivalent valued pseudo Markov systems 
such that A” is minimal, then there exist a matrix X and a one to one mapping 
9$: Z — Z” such that: (1) nH4 = z"XH^; (2) XA(o)H^ = A"(o)XH^; (3) 
"ao = Xt. 

Proof: Let A and A’ be as in Theorem 2.12, and let A" and A’ be as in 
Corollary 2.13 [with A replaced by A4"]. Then nH4 = z' X' H^ for some matrix 
X' and z/ = z"X^! [X is Corollary 2.13 is nonsingular] so that zH^ = 
n"X-X' H^. Similarly, X' A(o)H4 = A'(o)X' H^ and A'(c) = X A"(o)X-! so 
that X' A(0)H^ = X A"(o)X ! X' H^ or X^ X' A(0)H^ = A" X^! X! H^. Finally, 
ni = X'y: and X^ 5/ = n/" so that 4,’ = X^! X'5,. Where the elements of Z' 
and Z” are rearranged if necessary so that the y% and y% vectors corresponding 
to the same 17, vector have the same index (n = n" = n). | 

As in the previous section Theorem 2.9 and its Corollary 2.10 can be used 
for finding a valued true Markov system equivalent to a given valued pseudo 
Markov system. For this purpose, the conditions of that theorem [or corollary] 
will be considered as equations with one of the systems known the other being 
required to satisfy the Markovian properties. 
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The possibility of transforming valued pseudo Markov systems into valued 
true Markov systems has practical significance, since the later systems can be 
constructed in practice using relays, transistors, or other electrical devices [see 
Chapter I, Section 3]. 

If the A’ system in Theorem 2.9 is assumed to be known, then one can as- 
sume that A’ is also minimal, this additional assumption being justified by the 
fact that the construction in the proof of Theorem 2.3 provides a minimal 
equivalent system to any given system. In this case we have that the conditions 
of Theorem 2.9 are not only sufficient but also necessary [see Corollary 2.14 
above]. On the other hand the unknown system A has too many free param- 
eters making the use of the theorem impracticable. 

If the A system in Theorem 2.9 is assumed to be known, then the additional 
assumption that A is minimal [bearing on Theorem 2.3 as before] will make the 
conditions of Theorem 2.9 equivalent to the conditions of its Corollary 2.10, 
for in this case H4 is a nonsingular matrix. On the other hand, the conditions 
of Corollary 2.10 are only sufficient conditions a fact which must be remem- 
bered when one proves that they cannot be satisfied in some cases. 

We shall give now a useful geometrical interpretation to the conditions of 
Corollary 2.10 when considered as equations. Assume that the A system in 
Corollary 2.10 is given and consider the conditions in the corollary as equations 
to be solved for an unknown system 4’ subject to the restriction that A’ is true 
Markovian. As there is no restriction on the vectors 7, in the definition of a 
valued Markov system the third equation can be taken as a definition of the 
vectors 77,’ once the other two equations are solved. If A is a matrix, denote by 
C(A) the convex set of vectors generated by the rows of A. Then the equation 
XA(o) = A'(c)X can be solved for a Markovian matrix A’(o) and given X if 
and only if C(X 4(c)) € C(X) for in this and only in this case each row of 
XA(a) can be expressed as a convex combination of the rows of X and the prob- 
abilistic vector whose entries are the combination coefficients will be the cor- 
responding row of A'(c). Similarly the first equation is equivalent to the 
condition that z € C(X). We have thus proved the following: 


Theorem 2.15: The conditions of the Corollary 2.10 when considered as equa- 
tions with the system A given can be solved by a valued true Markov system 
if and only if there exist a matrix X such that (1) z € C(X); (2) C(XA(o)) € 
C(X) for every o € X. 

We shall use now Theorem 2.15 to prove two additional theorems. 
Theorem 2. 16: Let A = (z, S, {A(o)}, (1]) bea valued pseudo Markov system 
such that Y; |z,| < 1 where z = (z,) and if ¢4(0) = (€4(a)) is the ith row in 
A(a) then >>, |€,;(0)| < 1 for i = 1, 2,..., |S]. Then A is equivalent toa valued 
true Markov system A’ with state set S’ and |S’| = 2|S]. 


Proof: Let X be the |S| x 2|S| matrix 
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r1 0 0 
0 1 0 
0 0 1 
x= 
=i 0 0 
0 —1 0 
s. “0! area 


Let XA(o) = B(a) with rows £?(o) = (£2(o)). Then P(o) = +¢,4(0) for 
some j so that $7, |€2(¢)|<1. This implies that C(B(o)) € C(X) and Y; |z,|<1 
implies that also z € C(X) and the conditions of Theorem 2.15 are satisfied. 
The number of states of the resulting valued Markov system will be equal to 
the number of rows of X which is equal to 2|S]. | 


Theorem 2.17: Let E, = {fi}:cz be a generalized event of rank k. There exist 
another generalized event E, = (f/],.z over the same alphabet È and a con- 
stant c such that £,' is induced by a valued true Markov system with 2k states 
and for any u € Z* and any i € Z, c'?fi(u) = f; (u). 


Proof: Let A = (n, S, (A(o)], (1) be the valued pseudo Markov system con- 
structed as in the proof of Theorem 2.3, for E,. Note that {S| = k and z is the 
k-dimensional vector z = (10 --- 0). Let A, bea valued pseudo Markov chain 
derived from A and defined as A, = (z, S, {A(o)}, {n3) with Alo) = cA(o). If 
{fi} are the functions induced by A,, then clearly f,(u) = ce f,(u) for any ie Z 
and any u € E*. Now choose the constant c so that 4, will satisfy the condi- 
tions of Theorem 2.16 [the vector z = (1 0 --- 0) already satisfies these con- 
ditions] which is of course possible. By Theorem 2.16, there exists a valued 
true Markov system A’ with 2|S| states and functions (f; such that f; (u) = 
J(u) = c™ f(u) for every i € Z and u € E*. 


Remark: Note that the scaling factor c depends on the length of the word 
u but not on the word itself. On the other hand it is easy to see that the Theo- 
rem 2.17 would not be true in general if the scaling factor is removed because 
the values f;(u), being induced by a valued pseudo Markov chain, may grow, in 
some particular cases, beyond any bound when /(u) increases while the corre- 
sponding values f; (u) being induced by a valued true Markov system, are 
bounded. 


EXERCISES 


1. Prove that any function of a finite (pseudo) Markov chain can be represented 
also as an input function of a valued pseudo Markov system. 
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. Prove Lemma 2.2. 
. Prove Corollary 2.5. 


. Prove Theorem 2.6. 


A d U N 


. Prove Corollary 2.7. 


6. Let A, A’, A" be valued pseudo Markov systems such that A and A’ satisfy 
the conditions of Theorem 2.9 while A’ and A” satisfy the conditions of Corol- 
lary 2.10 [with A and A’ in the corollary replaced by 4’ and A’’]. Then A and 
A’ satisfy the conditions of Theorem 2.9 [with A’ in that theorem replaced by 
A"). 

7. Let A and A’ be two equivalent valued pseudo Markov systems. Prove the 
following properties: 


a. If A’ is minimal then the number of states of A is greater than or equal 
to the number of states of A'. 

b. If both A and A’ are minimal, then they both have the same number of 
states. 

c. If both A and A’ are minimal, then the corresponding matrices A(c) and 
A'(c) of A and A’ have the same set of distinct eigenvalues. 


8. Let E, bea generalized event and let r([E,],) be the maximal rank of any 
compound sequence matrix for E, such that the values f;(u) making the entries 
of these matrices have the property that /(u) < k. Assume that for a given E, 
we have that for some integer k, r((E,];) = r(lEzlk+) = +++ = (Ede) = t. 
Then either r(E,) = t or r(E,) > t + 2j. 

9. Based upon Exercise 8 give an algorithm for finding r(E,) when a given 
generalized event is known to be of finite rank and a bound is given on its 
rank. 


10. Let A = (x, S, {A(o)}, {4;}) be a valued pseudo Markov system such that 
n = (n), 1; > 0, 3,7; «1 and for any row €,(¢) = (€,,(¢)) in any matrix 
A(o), €,,(0) > 0 and Y5,6, (0) < 1. Then there exists a valued true Markov 
system A’ with |S] + 1 states and equivalent to A. 


11. Let A = (x, S, {A(o)}, {7,)} be a valued pseudo Markov system such that 
n = (nj, 0 x n, < 1 and all matrices A(o) = [a,;(o)] have nonnegative entries 
and $7151 a, (e) < 1. Then there exists a valued true Markov system A’ with 
2151 states and equivalent to A. 


12. If A and A’ are two valued pseudo Markov system satisfying the conditions 
of Corollary 2.10, with X 4 0, then corresponding matrices A(a) and A’(o) 
have at least one common eigenvalue. 

13. Let A = (x, S, {A(o)}, (1) be a valued pseudo Markov system such that 
n = (2j), 0 — m; x; 1 and the maximal eigenvalue, in absolute value, of any 
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matrix A(c)4'(c) is less or equal than 1/|S]. Then there exists a valued true 
Markov system 4’ with 2!5! states which is equivalent to A. 


14. Prove. that in Theorem 2.17 the valued true Markov system 4’ underlying 
the generalized event E, may be assumed to have the additional property that 
all the entries 7;, in all the vectors y; of A’ have the property: 0 < yj, < 1 but 
in this case the relation between the functions will be 


fi (u) = ac filu) + b 

for any u € E* and i € Z where a, b, c are constants. 
15. Consider the following: 
Definition: A word vector function is a function $ with domain X* and values 
in the set of all n-dimensional real valued vectors, where X* is the set of all 
words over a given alphabet X. A word vector function ¢ is realizable by a PA 
[probabilistic automaton] if there exists a PA A = (z, S, {A(o)}, x^) such that 
for every x € Z*, d(x) = n(x) [5f is a single vector having only 0-1 entries]. 
Prove the following: 
Theorem: A word vector function ¢ is realizable by a PA if and only if the 
following two conditions hold true. 

1. For any x € E* and e € & if d(x) = >) a, 4(x,), then ¢(xo) = Y; a,6(x,0) 
where x, --- x; € X* and @, are constants. 


2. Let $(x)) - - - 6(xj) be any set of linearly independent vectors and let o € X. 
There exist a stochastic matrix A(c) such that ¢(x,)A(o) = $(x;o). 


OPEN PROBLEMS 


I. Find an algorithm for ascertaining whether a given generalized event E, of 
rank k can be represented as a set of input functions of a valued true Markov 
system. 


2. Find an algorithm for ascertaining whether a given generalized event E, of 
rank k can be represented as a set of input functions of a valued true Markov 
system with k states. 


3. Define, in a meaningful way, and study “output functions" induced by non- 
homogeneous Markov systems with more than one letter in the alphabet X. 
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Chapter Ill 


Events, 
Languages, 
and Acceptors 


INTRODUCTION 


This chapter is devoted to probabilistic languages and events. The closure 
properties of those languages and events and their relation to regular events 
are studied. Some particular cases such as definite, quasidefinite, and exclusive 
events are investigated and the problem of approximating probabilistic events 
by nonprobabilistic ones is considered. 


A. EVENTS 


Although the abstract models to be considered in this chapter are particular 
cases of models discussed in the previous chapter, the problems to be investi- 
gated are different and motivated by the approach of the mathematical logic 
discipline to parallel problems encountered in the deterministic case. The 
following notations will be used: An event is a single word function f over an 
alphabet E[f: E* — real numbers] with the following subcases: an event f 
is pseudo probabilistic if it can be represented as the function induced by a 
valued pseudo Markov system with a single vector in the set {,}; A pseudo 
probabilistic event is an expectation event if the underlying system is a valued 
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true Markov system; an expectation event is a probabilistic event if the under- 
lying system has the additional property that all the entries of the (single) 
column vector are equal to 1 or 0. We shall write y” instead of y with F C S 
and the ith entry of 4” equal to 1 if and only if s; € F. Such a system will be 
called a probabilistic automaton. 

A probabilistic event f is a regular event if the underlying probabilistic auto- 
maton is deterministic and the function f can assume only the values O or 1. 
The term regular event will be used both for the function f as above and for 
the set of words u such that f(u) = 1. [This abuse of language is made in order 
to simplify the notations and no confusion will arise as long as context is clear.] 

An event f is called constant if f(u) = c for all u € X* and c is a constant 
[real number]. The following operations on events are defined [any two events, 
when combined, are assumed to be defined over the same alphabet X]: 


1. (f + gu) = flu) + glu) for any u e X*. 

2. (feu) = f(u)g(u) for any u e E*. 

3. (af )(u) = a(f(u)) for any u € &* and & a real number. 

4. (f V g)(u) = max( f(u), g(u)) for any u e X*. 

5. (f. ^ gY(u) = min( f(u), g(u)) for any u e E*. 

6. f(u) = 1 — f(u) for any u € X*. 

7. f(u) = fü) where u = o0, --- 0, ifu = 0, + 0, € E*. 

Some additional operations will be considered later and defined in due 
place. 


1. Probabilistic Events 


By definition, the class of PEs [probabilistic events] contains, as a proper subclass, 
the class of regular events. In addition it also contains the constant events as 
proved in the following: 
Proposition 1.1. The constant functions f(u) = c with 0 < c < 1 are PEs. 
Proof: Let f be the function f(u) = c for all u € Z* and 0 < c < 1. Let 
A = (n, S, (A(o)], n7) be an automaton over any alphabet X such that S = 
{1,2}; x= (c 1 — c); n” = () and 
1] — 
Alo) = i "| forall oez 
c 1—c 
Then 
c l1—c][l 
plu) = nA(uy" = (c i-o| i= 
c l— clo 
since A(c) are constant matrices and, therefore, A(u) = A(o), for any u € Z*. 
[For the definition of constant matrices see the Preliminary Section.] H 
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Proposition 1.2: If f is a PE, then so is f. 


Proof: Let A = (x, S, {A(o)}, n?) be the underlying probabilistic automaton 
(PA) for f. Let y be a column vector all the entries of which are equal to 1. 
Let y? be the vector such that y” + 4” = n. For any u € E*, we have that 
nA(u)(n? + n?) = z(u) = 1, since z(u) is a probabilistic vector. Therefore, 
f(u) = n(u)n? = 1 — nfun". Thus, 4 = (2, S, {A(o)}, n*) defines the function 


Proposition 1.3: If f and g are PEs, then also fg is a PE. 


Proof: Let A = (n, S, (A(o)], 4”) and A’ = (z', S{A'(a)}, 7°’) be the respective 
underlying PAs for fand g. Define A Q A’ = (x 692, S x S', (A(o) Q A'la), 
n & y) where ( denotes the Kronecker product [see Definition 1.2 and 
Lemma 1.1 in Section IL,B, 1]. Then 


SO (u) = (x Q TAAG) Q A'l) - - - (A(x) @ Ao.) Gr" G2 0°’) 
= (TACO) «+ A(o,)m") A'(o1) «++ A'la’) 
= f(u)g(u) where u=0,:-- Ok 
Since 4 Q) A’ is a PA, the proposition is proved. | 


Corollary 1.4: If f is a PE and c is a number 0 < c < 1, then cf is a PE. 
Proof: Let g in Proposition 1.3 be g: g(u) = c and use Proposition 1.1. Ẹ 


Proposition 1.5: Let f, g, h be PEs. Then the function fh + gh is a PE. 

Proof: Let A’ = (w, s, (4'(o)), 9”) and A” = (z'"", S", {4" (0), nF’) be the 
underlying PAs. Define the PA B as 

B=(S x S x S",nG)n RT", (A(o) & A'(o) G9 A"(a)], 
"n" Go m RN” + 1 Gom" Gon) 

Note that if an entry in one of the two products of 4 vectors is equal to 1, then 
the corresponding entry in the second vector is equal to 0 [since y7” has a zero 
entry if and only if the corresponding entry in y?” is equal to one] and there- 


fore the sum of the two vectors has only zero or one entries. 
Now, [7% is an |S|-dimensional vector with all its entries equal to 1] 


fu) = (x G9 z' 69 AAU) © A'(u) © A" Qu)" © 0°" © n”) 
+ AQT QTA) @ Alu) Q AW) G9 Ne Q n") 
= (1AN) Q (A'U) © (a A" (uy) 
+ (AANS) Q (7 AUNE Q ("A") 
= f(u)h(u) + g(u)h(u) 
as required. I 
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Corollary 1.6: Let f, . . . f, hı ... hy be two sets of PEs such that Y, A, = 1, 
where 1, denotes the constant function with all its values equal to 1. Then 
DS fh, is a PE. 


Proof: The proof is a trivial extension of the proof of Proposition 1.5 and 
is left to the reader. 


Corollary 1.7: Let f, ... f; bea set of PEs and let a,...a, be a set of numbers 
0<a,and 3; a; = 1. Then Ya, f, is a PE. 


Proof: Replace the fuuctions A, in Corollary 1.6 by the constant functions 
h, = a) i 


Theorem 1.8. Let f be a PE, then f is also a PE. 


Proof: Let f be defined by the PA A = (n, S,{A(o)}, 4"), then f is defined 
by the pseudo probabilistic automaton (SPA) A7 = ((y")’, S, {A7(o)}, 27). To 
prove this let u = 6, . . . Gp, then 


f^ (u) = (Y AXo1) ++ Aon" 
= (nA(o,) +++ Ao DFY 
= NA(S) --- A(o))m" = fl) 


We must prove that A" has an equivalent PA. To this end, let X be a 255! x |S] 
matrix whose rows are all |S|-dimensional vectors with entries zero or one. Then 
(nF) is a row of X. In addition C(X4'(c)) C C(X) [C(A) denotes the convex 
set of vectors generated by the rows of A]. This follows from the fact that 
multiplying a row of X by A'(c) amounts to the summing up of some of the rows 
of A'(a) [the rows of X have only zero and one entries]. But the rows of 4"(c) 
are columns of A(a) which is stochastic so that the resulting vector has all its 
entries between zero and one and therofore belongs to C(X). The conditions 
of Theorem 2.15 in Section II, C are thus satisfied and therefore there exists 
an SPA A’ = (a, S', {A (a)n) such that A? is equivalent to 4’, |S"| = 2/5, z', 
and the matrices A'(c) are stochastic, and y’ = Xn" [by the construction in 
Corollary 2.10 of Section II, C]. Let X? be the ith column of X, let z = (z;) 
and define the following PAs derived from A’: A, = (z', S’, (A’(a)}, X?) Each 
A, is a PA because the vectors X? have only zero and one entries. Let f’ be 
the PE induced by A,. Then, 


Xf) = Y n(n 4G)X7) 
= nz A(u) 3; n, X0 = WA (ux 
= n Ay = f") = f^(u) = fiu) 
But the fs are PEs and therefore, by Corollary 1.7 also f isa PE. | 
Proposition 1.9: Let f and g be PEs and let h be the function defined as 
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h(u) = 1 if f(u) > g(u) and A(u) = 0 if f(u) < g(u). If h is a regular event, 
then f V g and f ^ g are PEs. 


Proof: One verifies easily that max( f, g) = fh + gh arid min( f, g) = fh + gh 
so that this proposition is a particular case of Proposition 1.5. | 

Theorem 1.10: The class of PEs is not closed in general under the operations 
V and A. 


Proof: Let A = (z, S,{A(o)}, 2") be the automaton such that: z = (a, b}, 
S = (1, 2, 3, 4,) F = í1, 4, x = (4 0 4 0) and, 


2400 1000 
] 00 01 00 

Aa =|? , 45- 
0010 0044 
00 0 1 000 ! 


Let n,(u) and n,(u) be the number of occurences of a and b respectively in the 
word u. It is easily verified that 


f^i (277%) = 2-™()) 
Thus 
=} if n(x) = n(x) 
fu =; >} if n(x) < n(x) 
<4 if n(x) > n(x) 
Let g(u) be the constant event g(u) = 4 for all u € £*. Then f^(u) V glu) 
and f4(u) ^ g(u) are not probabilistic events. In fact we shall prove that the 
above events are not even pseudo probabilistic [a class which includes the class 


of PEs]. Assume the contrary, then there exists a pseudo Markov system B 
whose input function is f4 V g so that for any integer k we have that 


f (a* b**) = max( f4(a* b**), i = f^ (ak b**?) > i 
and for i < k 
Fak b) = max( flat b’), 9) = 4 
By Theorem 2.8 in Section II, C with w’ = a*, u = b and u” = 4, there are 
constants Co, . . ., c, such that 


f*a* b^?) = c, fa D^) + +++ + es fb) + co f%(a") 
implying that 4 < 4 Y*,c, while if u’ = a**!u = b and u” = A we have, for 
the same set of constants depending on u only, that 
f'(a**15**1) x c, f"(a**! b*) Leer co f *(a**!) 
1 — 1 


or 4 = 4 Jof, c, which is impossible. f^ V g is thus proved not to be a pseudo 
probabilistic event and the proof for f4 A g is similar. R 
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Corollary 1.11: The class of pseudo probabilistic events (and therefore also 
the class of probabilistic events) is a proper subclass of the class of events. 


2. Pseudo Probabilistic Events 


The class of SPEs [pseudo probabilistic events] includes, as a proper subclass, 
the class of PEs since the values of a PE f are bounded while the values of an 
SPE may increase beyond and bound. On the other hand, not every event is 
an SPE, this has been proved by Corollary 1.11. One can prove now in the 
same way as in the previous section that: 


1. The SPEs include all constant functions f — c where c is any real number. 
2. If f is an SPE, then so are f, f, and af where & is any real number. 
3. If f and g are SPEs, then so is f-g. 


The SPEs have also the following properties: 
Proposition 2.1: If f and g are SPEs then so is f + g. 

Proof: Let f be defined by A = (x, S, {A(o)}, n) and g by A' = (m, S", 
{A'(a)}, t). Let A" be the system A” = (z'", S", (A"(a)], n”) with S" = SUS’, 
n” —(nn)n = (0° nY 
and 


Ao) = is 0 | 
0 Ao) 
one verifies easily that f^" = f4 + f^. || 
Proposition 2.2: Let f and g be SPEs and let h be the event 
] if f(u)- gu) 
0 if flu) < gu) 
If h is regular, then the events f V g and f ^ g are SPEs. 


h(u) — | 


Proof: Similar to the proof of Proposition 1.9. 
Proposition 2.3: The class SPE is not closed in general under the operations 
V and A. 


Proof: The proof is included in the proof of Theorem 1.10 for the functions 
f and g used in the proof are PEs and therefore also SPEs while f V g and 
f ^ g where proved not to be SPEs. || 


Theorem 2.4: Let f be an SPE, there exists a PE g and constant numbers b, c, d 
with d > 0 and 0 < c < 1 such that dg(u) — b = c'?f(u) for any u c X*. 


Proof: Let A = (a, S, {A(a)}, n) be the underlying system for f. We have 
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proved already [Theorem 2.17 in Section II, C] that there exists a system 
A’ = (n, S', [A'(o)), n’) such that the vector z and the matrices A(a) are sto- 
chastic, and f'(u) = c™f(u) for some constant c and any u € XE*. Two 
additional transformations are needed in order to change 7’ so as to fit the 
definitions of a PA. Let A” be the system derived from A’ defined as A" = 
(z^, S', {A’(a)}, a(n’ + 5)) where b is a column vector all the entries of which 
are equal to b, a and b being two numbers chosen in a way such that all the 
entries of the vector a(y’ + b) are between zero and one. As a(n’ + b) has 
entries between zero and one, it can be expressed as a convex combination of 
a set of vectors {y} such that the entries in any vector y; are either zero or one. 
Thus, a(n’ + b) = Y; œn: with à, 2 0, $; œ = 1, f: = (N) and either y,, = 1 
or n; = O for all i. Let A, be the PA 4A; = (zv, S’, {A’(o)}, n), then for any 
u € X* we have that 


x a,f4(u) = > a(n A (ujn) = x A'(u) Ean 

= n A'(u)[a(n' + 5)] = a(n'A'(u)n' + n'A'(u)b) 

= af'(u) + ab 
[since z' 4'(u) is a stochastic vector and all the entries in 5 are equal to b]. Let 
g be the event defined as g(u) = >) «f ^(u). It follows from Corollary 1.7 [the 
f^ are PEs] that g is a PE. But g(u) = af'(u) + ab = ac'? f (u) + ab. Thus 
a^!g(u) — b = c'?f(u) for any u € X*. Setting a^! = d will complete the 
proof. It follows from the proof that a > 0 and therefore also d > 0, while 
the constant c is 0 — c « 1. a 


EXERCISES 


1. Let f and g be PEs. Find a PE h such that A(u) > 4 if f(u) > g(u) and 
hlu) < z if f(u) < glu). 
2. Let f be a PE. Prove that the sets {u: f(u) = 0}, {u: f(u) > 0}, fu: f(u) = 1} 
are regular events. 
3. Let A = (z, S, {A(o)}, n7) and A’ = (z', S', (4'(c)), 9°’) be PAs with 
S  S' =ġ, and let 4" = (a^, S", {A'(o)}, nF") be the PA defined by 
x” = (an Pr’), a, B>0 and a + B= 1; S" — SU S', A = (ary gn yy 
and 
A(c) 0 
A" = 
er ol 


Show that f4'(u) = f4(u) + f^(u) for all u e X*. 


4. Prove Corollary 1.7 using the construction in Exercise 3 above, and show 
that the resulting automaton is more economical [as to the number of its states] 
when the construction in Exercise 3 is used. 
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5. Let A = (7, S, {A(o)}, n7) be a PA. Find an equivalent system A’ = (z', S", 
{A'(o)}, 1") such that z' has the form z' = (10 - -- 0), |S’| = |S| + 1, the ma- 
trices A'(c) are stochastic, the vector 4’ = (y/) has the property that 0< y; <1 
for all i. 


6. Use the construction in Exercise 5 above in the proof of Theorem 1.8 to 
replace the use of Corollary 1.7 and show that the resulting PA for f is more 
economical [as to the number of its states] if the construction in Exercise 5 is 
used [even if the construction in Exercise 3 is used for proving Corollary 1.7]. 


7. Prove that any finite dimensional vector € = (€,)"_, such that 0 < č; < I 
can be expressed in the form € = 0”, a,¢' where C are vectors all the entries 
of which are 0 or 1 and a, > 0, $}, a, = 1. Provide an explicit construction 
for the above decomposition. 


8. Prove: If f is a PE, then the functions g,(u) = f(u'u) and h“ (u) = f(uu) 
for a fixed u’ € X* and all u € X* are PEs. 


9. Prove that there are SPE f and g such that | f — g| is not an SPE. 


10. Prove that if f is an SPE such that the matrices in the underlying system 
are doubly stochastic then a corresponding system defining f can be found with 
doubly stochastic matrices. 


OPEN PROBLEM 


Find a class of events, properly including the regular events and included 
[properly] in the class of PEs which is closed under union intersection and 
complementation. 


3. Bibliographical Notes 


Most of the material presented in this section appeared in the literature under 
various names and with variations. Thus Proposition 1.1 and Exercise 2 are 
due to Starke (1966b, c), Propositions 1.2 and 1.3 are due to Paz (1966), 
Proposition 1.4 and Exercise 8 should be credited to Bukharaev (1967), Pro- 
positions 1.5-1.9 and Exercises 1 and 5 are from Nasu and Honda (1968), 
while Theorems 1.10 and 1.11 were given in a restricted form in Nasu and 
Honda (1968). 

Pseudo probabilistic events were studied by Turakainen (1968) who is to be 
credited for Propositions 2.1, 2.4, and Exercise 3. Many of the proofs are 
however new and some propositions are given here in a stronger version than 
the original. Zadeh (1965) introduced the concept of fuzzy sets generalizing 
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the classical set concept. The events as introduced here are in fact fuzzy sets 
with the “universal” set being the set of all words over a given alphabet. 
Other related papers: Paz (1967c), Carlyle and Paz (1970). 


B. CUT-POINT EVENTS 


1. Closure Properties 


Definition 1.1: Let f be an SPE and A a real number. The set of words T( f, A) 
is defined as 
T(f, À) = fu: f(u) 7 à] 

and is called a general cut-point event (GCE). If f is a PE defined by the 
automaton A and 0 « 4 < 1, then T( f, A) to be denoted also by T(A, A) is 
called a probabilistic cut-point event [PCE]. 

Using the theorems of the previous section, we shall now study the closure 
properties of PCEs and their relation to GCEs. In fact the first proposition 
shows that the two classes of events are identical. 


Proposition 1.1: The class of PCEs is identical to the class of GCEs 


Proof: The class of PCEs is clearly a subclass of GCEs. To prove the con- 
verse, let E = {u : f(u) > A} be an event such that f is SPE. By Proposition 
A, 2.1 the function f' = f— À is also an SPE and has the property that f'(u) > 0 
if and only if f(u) > A. By Theorem A, 2.4 there is a PE g and numbers 
0<c< 1, 0x d and arbitrary b such that c? f'(u) = dg(u) — b for all 
ue E*. If b = 0, then clearly the set {u : f'(u) > 0} is equal to Z* or tog 
and these events are PCEs as will be shown subsequently. If f+ 0, then 
E = {u : f(u) > 4] = (u:f'(u) > 0} = fu : g(u) > b/d} where g(u) is a PE. 
Thus £ is a PCE and the proposition is proved. | 


Proposition 1.2: The class of regular events is a subset of the class of PCEs. 


Proof: If E is a regular event, then its characteristic function can be rep- 
resented in a degenerate PA [see Section A, 1]. Thus there is a PE f such 
that E = T( f, 0). || 
Proposition 1.3: The class of PCEs is not changed if the defining pseudo prob- 
abilistic automata are restricted to have only degenerate initial distributions. 


Proof: By Exercise 5, Section A, 2. || 


Proposition 1.4: Let 7( f, A) be a PCE and let u be a number 0 < u < 1. 
There is a PE g such that 7( f, A) = T(g, wy). 
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Proof: If u < A, then x = ad with 0 < a < 1 and, using Corollary A, 1.4, 
we may use the PE g = af. Clearly f(u) >A if and only if g(u) > x. If à< 4, 
then let g be defined as 

_l-—-H#p, à 
deny eA 
By Corollary A, 1.7 g is a PE and f(u) > 4 if and only if 
A A 
gu) > (— Bad oy 
as required. | 
Remark: The requirement that 0 < u is necessary since any PCE of the 


form T( f, 0) defines a regular event [see Exercise 2, Section A, 2] and the class 
of PCEs properly contains the regular events. This fact will be proved later. 


Proposition 1.5: If E is a PCE and R is a regular event then E U REO R 
and E — R (meaning the set of words in E but not in R) are PCEs. 


Proof: Let E = T( f, A) and R = T(g, 0) where g(u) is either O or 1 for all 
u € E*. Then fg is a PE by Proposition A, 1.3. It is easily verified that 
EN R = T( fg, A) for fg(u) > A if and only if f(u) > A and g(u) = 1. Con- 
sider now the function fg + g. By Proposition A, 1.5 this function is a PE 
[the function h in that proposition is the function g here and the function g 
is the constant function with value 1 here]. If g(u) > 0, then (fg + gX(v) 
=1>A. If g(u) — 0, then (fg + gYu) >A, if and only if f(u) >A. It 
follows that T( fg + g,4) — EU R. To complete the proof we note that 
E — R = EN Rand R is a regular event. | 

The reverse of an event E, to be denoted by £, is defined in the usual way, 
i.e, É contains all the words 6, --- o, such that o, --- o, are in E. We are 
now able to prove the following: 

Proposition 1.6: The class of PCEs is closed under the reverse operation. 

Proof: Let E be the PCE E = T( f, A). Then E=T(f, 4) because E —(u : u 
—90,:::0,0,::0, € E] —(u:u— 0: Or, f(0, 7-0) A] — du: 
—9,::0,](0, +++ 0,) > À} and by Theorem A, 1.8 f isa PE. [ff 
Proposition 1.7: Let E = T( f, A) be a PCE such that the set {u : f(u) = 4} is 
regular, then E = X* — E is a PCE. 

Proof: E = T(f, A) = {u : f(u) > Aj; therefore, E = {u : flu) <4} = fu : flu) 
>1—-Y=u:fM>1—-Yvwu:fwM=1—-A=tu: f@m>1—au 
fu:fu)=4. W 
To complete the proof we use Propositions A, 1.2 and B, 1.5. 

Remark: In the proof of Theorem A,1.10 a PE fis given such that all ne 
sets (x : f(x) > A}, {x : f(x) < 4} and (x : f(x) = 4} are nonregular for A= 
Thus the condition of Proposition 1.7 does not hold true in all cases. 
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Proposition 1.8: The class of PCEs is closed under the operation of finite de- 
rivation. [The derivate of an event E with respect to the word u is the event 
D(E) = (v :uw € Ej]. 


Proof: Let E be the PE induced by the PA A = (S, 2, {A(a)}, 4”). Let A, 
be the PA A, = (S, n(u), {A(o)}, n7) where z(u) = zA(u), then T(f*, a) 
= fw : n(u)A(w)y? > 4) = {w :zA(uwy? > 1} = fw : uw € E]. It follows that 
D,(E) is a PCE. || 


Proposition 1.9: Let E be an event and assume that all the events of the from 
D,E) for I(u) = k, k an arbitrary fixed integer, are PCEs. Then E is a PCE. 


Proof: We shall prove the proposition for k — 1; the proof in the general 
case is similar. We remark first that E = (U,.;D,(E)) U F where F is empty 
or contains the word e only,t and is therefore a regular event. Since PCEs are 
closed under union with regular events it suffices to prove that E' = U,.;D,(E) 
is a PCE. Our second remark is concerned with the possibility of inducing a 
“delay” into a PA. Let A = (S,72,{A(o)}, ^) be a PA defining the event 
T(A, 4). Define the PA A’ = (S’, z', (A'(c)), ^) as follows: S' = SU s*, 
s* ¢ S; n’ is the degenerate probabilistic vector having a 1 in its first entry 
only, the other entries being 0; F' = F and, finally, A'(c) is the matrix 


A j| m | 
Plo Ao) 
It is easily verified that for any word u = 0,--- op € X*, Ku) > 1, 


P*(G,0, +++ Ox) = p'(o5--: 0), PEC) = pe), p*(e) = 0 

Let f,, be the characteristic function of the event o,2*; as the v,£* are regular 
events f, are PEs by Proposition 1.2. and 3,,,.; fo = f; = the constant func- 
tion f,\(u) = 1 if (u) > 1. Assume that D,(E) = T(A, A); one may assume 
the same 4 for all v; because of Proposition 1.4. Finally, let 4/ be the PA 
derived from the 4; as above, i.e., p*'(, --- 0,) = p^(o; --- 0,). We claim 
that E' = T(A, A), where f^ = locz foa /^ [which is a PE by the Corollary 
A, 1.6]. To prove our claim we remark that for any word u with /(u) > 1 if 
u = G,w, then f,(u) = 1, f. (u) = 0 for j z i, and f^(u) = f^(w). Thus, for 
u=a,w, f^(u) = f^(w) with the result that u € T(A, A) if and only if 
w € D,(E) or T(A, 4) = U,,. 0; D(E). This completes the proof. i 


Theorem 1.10: There are events which are not CPE. 


Proof: We define an event E over a single letter alphabet X = {o} which is 
not a PE. Let u,,25,... be a lexicographical enumeration of all nonempty 
words over a two letter alphabet A — (a, b). Let x be the infinite sequence of 


tThe empty word will be denoted by e instead of 4 whenever necessary in order to 
aviod confusion with the cut-point 4. 
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letters from A resulting from the concatenation of the words u,, u», . .. in their 
proper order [e.g., u, = a, u, = b, u, = aa, u, = ab, etc., and x = abaaab - - -}. 
Let x(n) denote the nth letter in the sequence x and define the event over X 


E = (o" : x(n) = aj 


then E is not a CPE. To prove this, assume the contrary. Thus E = T( f, 4) 
for some PE f and some cut point A. This means that f(o") > A for x(n) = a 
and f(o") < A for x(n) = b. By Theorem 2.8 in Section II, C, there exists a 
set of constants co, . . . , c, ., such that for any integer k 


f(a**") = e f(a*) + ei fia) + «d eua fiot Ca 


Let €, -° © €,, Ôo -:: 6, be two words in A* defined as follows: if c, > 0, then 
€, = b and ô, = a; if c, < 0, then e, = a and 6, = b; e, = a, Ô, = b. By the 
construction of the sequence x, there are integers k; and k, such that 
x(k,)x(k, + 1) +++ x(k, + n) = € ---€, and x(kj)x(k; + 1) +++ x(k, + n) = 
s +- 0, It follows from that if c, > 0, then x(k, + i) = b and f(o"*') < A, 
and also x(k, + i) = a and f(o***) > A. If c; <0, then x(k, + i) = a and 
f(a) > Zand also x(k, + i) = b and f(o**') <A. x(k, + n) = a so that 
f(o***) > A, and x(k, + n) = b so that f(o"*") <A. We evaluate now the 
formula (*) first for k = k,. We have 


n-i a-l n-1 
A<fo*)=Yaf™)<ANe¢ o Yea >! 
i-0 i-0 i-0 


The inequality on the right follows from the fact that the values f(o*'*^) cor- 
responding to positive coefficients c, are not decreased while the values f(o*'*") 
corresponding to negative coefficients c, [if there exist such coefficients] are 
decreased. On the other hand, evaluating the formula (*) for k = k, we have 


A> f(o®**") PS c,f(o"*") > i$ ^ or S eal 
i=0 i= i=0 


since in this case the values f(c***") corresponding to positive coefficients [if 
there are such] are decreased and values f(o***") corresponding to negative 
coefficients are not decreased. Thus, 1 < 7-1 c; < 1 which is impossible, and 
therefore the event E is not a CPE. || 


Remark: 'The reader familiar with the theory of abstract languages will find 
it easy to show that the event E defined above is context sensitive. It could 
not be context free, for any context free event over a single letter is also 
regular [this is a well-known fact] and regular events are CPEs. On the other 
hand, the only property of the sequence used in the proof of Theorem 1.10 is 
that any word of A* be a subsequence of x. Thus, by defining the sequence x 
in a more complicated way, but still having that property, it would be possible 
to find an event which is recursive, not context sensitive and not a CPE. 
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EXERCISES 
1. Let A be a given PA. Prove that the sets of words {u : p4(u) = 0}, {u : p(u) 
> O}, {u : p*(u) = 1} and {u : pu) < 1] are all regular sets. 
2. Let A be a PA such that S contains two elements only and X contains one 
element only. Prove that T(A, A) is a regular set for any 4,0 <å < 1. 
3. Consider the following PA: A = (S, z, {A(o)}, 1?) over X = (o,, o3}, where 
S = (5,55... 533 T = (10--- 0); 4” = (në) is defined by the requirement 
that 
1 if i=4 
0 otherwise 
and A(o,,) = [a;,(c,)] is defined by the relations: 

a, (0,) = as(01) = aa(0:) = ass(0:) = as(0:) 

= ag(01) = ai(0;) = a4(0;) = a«(0;) = anlO) = agla) = 1 

a;(0;) = a7(0,) = €, 4,(0,) = anlO) = 1 — €, 

(2) = A6(F2) = Bs6(F2) = Ase(F2) = à. 

as(0;) = Ò, a3(0,) = 1 — ò, and a;(0,) = 0 
in all other cases with 0 <€ < 1,0 < ô< 1. Let A be the number defined 
as A = 6/2. Describe explicitly the sets of words {u : f^(u) > 4), (u : f(u) = 4j 
{u : f^(u) > A} and show that these sets are hot regular. 


nf = 


4. Same as previous exercise with 4 = 4 and the PA A = (S, 7, {A(a)}, n") 
defined es follows: 2(o,, 02}, S = {5,,..., s, 1 = (10--- 0), nF = Lifi = 5, 
and y7 = 0 otherwise, 
44201) = 43(0;) = ass(0;) = as(01) 
= Ayr) = a4«(0;) = as«(0;) = asi(0;) = 1 
a;0;) = ax(01) = au(01) = a4s(01) 
= ax(0;) = ax(0;) = ax(0;) = a«(0;) = 4 
a; (0,) = O in all other cases. 


5. Let A and B be two PAs prove that the set of words fu : f^(u) > f*(u) is 
a PCE. 


6*. Consider the following PA: A = (S, z, (A(o)], ^) over the alphabet 
x = (a, < -+ o,} where S = {s,, s;) z and y” are arbitrary and the matrices A(a,) 


are defined as 
Aa, | — d; d, | 
UL Ibl 


and are such that for all 7 and j, a, + b; Æ 0, a,b; # 1 and a,b; = a,b; Prove 
that T(A, A) is a regular set, for any cut-point A. 
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7. Let M be an SSM [see Definition 1.1, Section I, A]. Let (u, v) be a pair of 
words of same length over the input and output alphabets X and Y respectively 
of M. Let y bea symbol in the output alphabet Y and let 0 — 4 < 1 be a real 
number. Let A(u) be the matrix A(u) = >, A(v|u) [summation is over all v 
with Ku) = I(v)] and let p"(y|u) denote the probability that the machine M will 
have y as its Jast output when the word u is fed into it. a. Prove that if the 
set of different matrices A(u), u € X* is finite then the set of words 


T(M, A, y) = {u: p"(y]u) > 4} 
is a regular set for any À and y as above. 


b. Assuming that the set {A(u): u € X*} contains at most m different elements, 
find the number of states of a minimal automaton defining T(M, A, y). 


$8. Prove that if in Exercise 7 the matrices A(x), x € X are degenerate stoch- 
astic [i.e., deterministic], then the set of words {u: p"(y|u) > A} is regular for 
any y and 4 as in that exercise. 


OPEN PROBLEMS 
1. Are PCEs closed under union, intersection, and complementation? 


2. Give a decision precedure for ascertaining whether a set of matrices {A(a)} 
generates only finitely many different matrices in the set A(u), u € Z*. 


2. Regular Events and Probabilistic Cut-Point Events 


The following theorem, due to Nerode, is very useful and we shall have the 
occasion to use it many times. It serves as a characterization of regular events. 
[In order to comply with the common notation, we shall denote, from here 
and on, by x, y, Z,..., words over an alphabet È.] 


Theorem 2.1: Let U be a set of words the following three conditions are 
equivalent: 


1. U isa regular set. 

2. U is the union of some of the equivalence classes of a right invariant 
equivalence relation over X* of finite index. 

3. The explicit right invariant equivalence relation E defined by the con- 
dition that for all x, y in E*, x Ey if and only if for all z e £*, whenever xz 
is in U, yz is in U and conversely, is a relation of finite index. The index of 
the relation is the least number of internal states of any automaton defining U. 
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The reader is referred to Rabin and Scott (1959) for the proof of this well- 
know theorem. 

We shall need also the following combinatorial lemma due to Rabin [Rabin 
(1963)]. 


Lemma 2.2: Let Z, be the set of all n-dimensional probabilistic vectors [i.e., 
P, = [E = (E), 6, 0 Y. €, = 1] and let U, be a subset of A, such that for 
any pair of vectors Ë and # in U, the inequality Y7., |é; — 4,| > € [€ is a given 
positive real number] holds true. Then U, is a finite set containing at most 
k(€) elements where k(€) = (1 + 2/e)"^!. 


Proof: Let € = (&,) be a point in U, and define the set of points v. in n- 
dimensional space as v, = (6 = (£) :6, < Ca Y, (C, — €) = e/2). It is easy to 
see that each v; is a translate of the set v = (6 = (€;):¢, > 0, 3506, = €/2}. 

Since € is a probabilistic vector and c, < €, for all i we have also that v, is 
a subset of the set of points V, = (& = (0) :62 0, ©, = 1 + €/2}. A point 
C is an interior point in a set v, [relative to the V, set] if and only if C, > C, 
for all i. Figure 16 exhibits the different sets defined above for n = 3. € and 
1| are two points in U,. It follows from the definitions that two different sets 
v and v, cannot have a common interior point. Assuming the contrary, if ¢ 
is an interior point of both v; and v, then C, €, and ¢; > y; for all i and, 
therefore, |£, — 1| < IC, — 5| + |C; — €,| for all i. This would imply that 


w 


Figure 16. Geometrical representation of the sets vz for n = 3. 
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XE-u«XlG-sc-Ekb-4-j4$-6 


which is impossible by the definition of the set U.. 

It is thus seen that the number of points in U, cannot be larger that the 
number of simplices v, which can be packed into the symplex V.. To get an 
estimate of this number let S(v,) be the volume of the symplex v, then 
S(v,) = c(c/2y'-' where c is a constant not depending on e. Similarly, s(V.) 
= c(14-€/2y-!. Therefore, if k simplices v, can be packed into V, then 
kc(e/2y-! < c(1 + (e/2)y-!. Thus k < (1 + 2/e))"! and this completes the 
proof. i 


Remarks: One may prove that the set U, is finite in a much easier way by 
using the Bolzano-Weierstrass theorem, since the set U, can be shown to be 
bounded with no accumulation point under the measure 5; |¢,|. On the other 
hand, the proof given here provides also a bound on the number of elements 
in U,. This brings up an open problem. The bound of the lemma is clearly 
not sharp and a sharper bound can be proved provided one can get an estimate 
for the “covering ratio" of the packing problem involved in the proof. In a 
more explicit way, consider the following problem: Let V be a simplex of side 
length a and let v; be simplices of side length b «& a and having the same linear 
dimension. Let k be the maximal number of simplices v, which can be packed 
into V and such that all the v,s are in a relative translated position one to the 
other [no rotation is allowed]. Provide an estimate to the ratio kS(v,)/S(V), 
where S denotes the volume of the respective simplices. A solution to this 
problem will lower the bound of the lemma by the above ratio [which may 
depend on the dimension 7 of the involved simplices]. The next definition and 
theorem will provide a sufficient condition for a PCE to be a regular event. 


Definition 2.1: Let A be a PA. The cutpoint 4 is €-isolated with respect to A 
if |P4(x) — 4| > € for all x € X*, for some € > 0. 


Theorem 2.3: If A is an €-isolated cutpoint for a PA A, then there exists a 
deterministic automaton B such that T(A, A) = T(B). If A has n states, then 
B can be chosen to have m states where 


ms (Itz) 


Proof: Translating the equivalence E in Nerode’s theorem [third condition] 
into probabilistic terms we have that x, y € Z* are nonequivalent words if 
there is a word z such that p(xz) >A and p(yz) <A or vice versa. This 
means that z(x)9'(z) > A and z(y)g'(z) <A or vice versa. It follows that 
[n(x) — x(y)]n*(z) > 2e, since A is isolated. Writing this inequality explicitly 
we have 
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E (a(x) — nn? > 2€ (*) 
but 
3 (a(x) — ny)? x $ (ux) — 2,0))* max nz) 
+ Maux) — n() min g/(z) 
= Mx) — 1(y))* (max 4,7(z) — min 4;7(z)) 
< Nux) — n(y)* = $n) — 16) 


by using repeatedly an argument similar to that used in the proof of Proposition 
A, 1.3 in Chapter II and by the fact that 0 < (z) < 1 for all i. 

Combining this with the previous inequality (*) we have that, for non- 
equivalent words x and y, the following inequality holds 


26€< 3 X |n(x) — zx) or 25 In(x) — ny)| > 4€ 


Thus the set of all vectors of the form 2(x) such that every two vectors in the 
set are nonequivalent is a set of the form U,, in Lemma 2.2 and this implies 
that this set is finite with 


ie" 


elements. Nerode's equivalence is thus shown to be of index <k, which 
exceedes the minimal number of states of a deterministic automaton defining 
the given PCE. | 


Remark: The above theorem, due to Rabin (1963), is clearly one of the most 
interesting theorems in the theory of PCEs. The following is a quotation from 
Rabin’s original paper and it shows its motivation for introducing the concept 
of an isolated cutpoint. 

“Let A be a PA and 0< å < 1. Given a tape x € E*, we devise the 
following probabilistic experiment E to test whether x € T(A, A). We run x 
through A a large number N of times, and count the number m(£) of times 
that A ended in a state in F. If A < m(E)/N, we accept x; otherwise we reject 
it. Because of the probabilistic nature of the experiment, it is of course possible 
that we sometimes accept x even though x ¢ T(A, A), or reject it even though 
x € T(A, A). By the law of large numbers, however, there exist for each x such 
that p(x) = A and each 0 < € a number N(x, €) such that 


m(E) ) = 
Pr (sii <Er ETAD) >e 
In other words, the probability of obtaining the correct answer by the experi- 
ment £ (consisting of running x through A N(x, €) times and counting successes) 
is greater than 1 — e. 

To perform the above stochastic experiment we must know N(x, €), which 
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depends on |p(x) — À|. Thus we actually have to know p(x) in advance if we 
want to ascertain whether x € 7(A, A) with probability greater than 1 — € of 
being correct. Once we know p(x), however, the whole experiment E is super- 
fluous. 

The way out is to consider values À such that |p(x) — 4| is bounded from 
below for all x € X*. 

It is readily seen that there exists an integral valued function A(ó, €) such 
that for an isolated A and any x € X*, 

Pr (En EA x € T(A, 1) >1—€ 

Thus the proposed stochastic experiment for determining whether x € T(4A, A) 
can be performed without any a priori knowledge of p(x). This fact makes it 
natural to consider isolated cut-points." 


It is to be noticed here that in Rabin's argument above the testing procedure 
requires that the number N(x, €) be determined before the experiment begins. 
If this requirement is removed, then we do not have to know P(x) in advance 
for ascertaining whether x € T' (A, A) with given probability. This fact follows 
from the following theorem due to Darling and Robins (1968). 


Theorem: Let x,, x,,... be a sequence of independent variables with P,(x; = 1) 
= (1 + 6)/2, P(x, = —1) = (1 — 6)/2, —1 < ô < 1, so that E,(x,) = ó [E 
denotes here expectation]. Let H* be the hypothesis that ô > 0, and H- the 
hypothesis that ô < 0. For an arbitrary given 0 < € < 1, there is a test of 
H* versus H- such that if T denotes the sample size of the test, then 


1. P(T < oo) = 1, all ô z 0. 
2. P,(accept H-) < e, all ô > 0; Paccept H+) < e, alld < 0. 
3. EXT) < oo, all ô z 0. 


Since one may always assume that A = 4 [see Proposition 1.4], the above 
theorem shows that there is a testing procedure for a word x to ascertain 
whether p(x) 4 (H*: x, = 1 if x is accepted at the ith trial and x, = —1 if 
x is rejected at the ith trial]. The testing procedure is finite with probability 
1 [(1) in Darling and Robins theorem] and does not depend on 6(=| p(x) — 4|) 
but only on the required degree of reliability e. The only assumption still 
necessary is that P(x) Æ A. 

It is also worth mentioning that to decide whether a given cut-points À is 
isolated or not is an open problem which seems to be as difficult as the problem 
of deciding whether a given PCE is regular. Moreover the condition of Theorem 
2.3 is only a sufficient condition for the PCE to be regular. This last fact will 
be proved latter [see Corollary 3.4]. 
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EXERCISES 


1. A cut-point À is weakly isolated for a PA A if |p(x) — 4| > € or p(x) = A 
for all x € E*. Prove that if A is a weakly isolated point for 4, then the event 
T(A, A) is regular. 

2. Two PAs A and B are mutually isolated if | p4(x) — p*(x)| > A for all 
x € XE*. Prove that if A and B are mutually isolated, then the event E = 
{x : p(x) > p*(x)] is a regular event. 

3. Let E = (E) be a partition of E*. E is called regular if there are only 
finitely many blocks E, in E and all Es are regular events. Prove: Any regular 
partition E = (E,) of X* can be represented in the form E, = T(A, 4j) where 
Aisa PA. Conversely, if A is a PA such that the set of values (p'(x) : x e £*} 
is finite, then the set of events (x : p*(x) = k} form a regular partition of £*, 
where k, ..., k, is the set of all different possible values p4(x). 


4. Prove that the bound of Lemma 2.2 can be improved for n — 2 so that 
; 2 1 
k(e) < 7 t 3e 

in this case. 


4. Prove Theorem 2.3 for the following case: The automata 4 are allowed to 
have nonrestricted final vectors 4” = (yZ) [i.e., 4,’ may assume any real value 
and is no longer restricted to the values 0 and 1], and in addition, the cut-point 
À is also allowed to assume any real value, all the rest of the components of A 
remaining as in the original definition. Prove that for this case the bound of 
Theorem 2.3 is k(e) = (1 + (d/2€))! where d = max, në — min, rf. 

5. A cut-point À is semiisolated for a PA A if p(x) — À > € for all x such 
that p/(x) > A or else 2 — p'(x) > € for all x such that p*(x) < A. Prove 
Theorem 2.3 with the term “isolated” replaced by the term “‘semiisolated” and 
give a new bound for this case. 


OPEN PROBLEMS 


1. Give a decision procedure for ascertaining whether a cut-point 4 is isolated 
for a given PA. 


2. Give an algorithm for finding all isolated cut-points of a given PA. 
3. Give a sharp bound for Theorem 2.3. 


3. The Cardinality of PCEs and Saving of States 


Theorem 3.1: The class of PCEs is nondenumerable. 
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Proof: Let X = (0, 1} and define A to be the PA A = (fso, S1}, 2, {A(o)}, 9”) 
where z = (1 0); 4” = (9); 


1 0 A 
A(0) = ; AD= 
ud l i] ia He 


It is easy to prove that for x = a, -+ © Op, p(x) = .o, +--+ 0, where .0, -++ 0, 
is an ordinary binary fraction [see Exercise A, 5.6 in Chapter II] and p^ is the 
function induced by A. Thus the set of numbers (p(x) : x € X*] is dense in the 
open interval (0, 1) for the given PA A. Let A, and A, be two cut-points 
0 « A, <A, « 1, then 7T(A, 4j) = T(A, À;) for there is a word x such that 
p(x) > A, and p(x) < 2.. This follows from the density of the values px). 
Thus the Set of different T(A, 4)s coincides with the set of different As which is 
not countable [the As are real numbers in the interval (0, 1)]. This completes the 
proof, since the class of PCE contains the events of the form T(A, J) above. Bi 


Remark: It follows from the above theorem that there must be nonregular 
events representable in finite state machines [context free, context sensitive, 
etc.] which are representable in PAs. On the other hand, the proof of Theorem 
3.1 is existential. We shall however exhibit in the following examples explicit 
nonregular events, some of them context free, which are represented in a PA. 


Theorem 3.2: Let A be the PA defined in the proof of Theorem 3.1. The event 
T(A, A) is regular if and onlf if À is a rational number. 


Proof: The class of PCEs is closed under the reverse operation [Proposition 
1.6] and so is the class of regular events. It suffices therefore to prove that an 
event of the form T(A, A) is regular if and only A is a rational number, where 
T(A, A) is the reverse of T(A, A) = {x = 9,::-0,4:.0,:--0,7» A) or T(A, 4) = 
{x = O,++-+O,: O; +++ 0, > A}. Assume first À to be a rational number, i.e., 
A = AA Agdaay ++ Anam Where Ay, 77 Agim is the recurring period in 
the expansion of 4. [One may always assume that the expansion of a rational 
number has a recurring period: for one can add the recurring period I to a 
finite binary expansion.] A finite automaton B defining T(A, 4) in this case can 
be defined as B = (S, s, M, F) where S = {5...Spameip E = (0,1, F = 
{Sk+m+1} and the function M is the function 


d, it dies di for | 
OG, 


i<kt+m—2 
i=k+m-1 
M(o,, j) = 10» if j=044, 
j=0,1 |€x«m«i if j=14A, 

0, if j=1,2 for i>k+m 
It is easily verified that the above automaton defines the event T(A, A) as 


required [the reader is urged to draw a state graph for the automaton B], 
proving that T(A, A) is a regular event for rational A. Assume now that 4 is 


| fo i-ck--m-—i 
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an irrational number A = .4,4; --- A, Ags; +++. Consider the infinte sequence 
of symbols 4,4; --- 4, --- A; --- where the As are the consecutive digits ap- 
pearing in the expansion of A. No two different suffixes of the above sequence 
of the form A,4,,-- s AjAja cc: i< j can be equal, since otherwise the 
sequence of digits 4,4,,, --- 4;., would recur periodically in the expansion of 
A, a contradiction to the fact that À is not a rational number. Let then k be the 
smallest integer such that 4,,, = A,,, for given i < j. Let z; be the word 
defined as 
Z = ie ZA. disk if Aisk > Ayan 
] Ajri EET! J*k if Ayan DÀ. 


Then either .4, --- 4, Aini Au 47 A and JA, -- AjAjua Ajay <A in the first 
case (Aise > Aj.) OF Aye A Ajay Ajay > A and Aise Asa Ay <A in 
the second case. Thus the word z,, distinguishes between the words x,=A,-- +A, 
and x, = À, --- A, for any i and j z i. It follows that Nerode’s equivalence 
is of infinite index for the event T(A, A) and the event is therefore not a regular 
event. This completes the proof. i 


Corollary 3.3: For any integer n there are regular events requiring at least an 
n-state deterministic machine for their realization but can be represented in a 
two-state PA. 


Proof: The set of deterministic automata with n-states or less is finite but 
the set of events T(A, A) as defined in Theorem 3.2 with À a rational number 
is infinite and any two such events are different [this fact is included in the 
proof of Theorem 3.1]. Thus, there must be events of the form T(A, 4), À 
rational requiring more than n-states for their deterministic realization. 


Corollary 3.4: There are regular events representable in PAs with a nonisolated 
cut-point A. 


Proof: As mentioned before the set of values p(x) for the automaton A 
defined in Theorem 3.1 is dense in the interval (0, 1). The events T(A, A), A 
rational, are therefore regular although the cut-point A is not isolated. I 


Remark: Theorem 3.2 provides a class of explicit nonregular events rep- 
resentable in PAs Corollary 3.3 shows that it is sometime possible to save 
states [in exchange for precision] by representing a regular event in a PA. 
Corollary 3.4 shows that the condition of Theorem 2.3 is a sufficient but not 
necessary condition for regularity. In connection with Corollary 3.3 it will be 
interesting to find out what is the exact price [in time and/or precision] one 
has to pay in exchange for the saving of states. 


Remarks on Equivalence and Reduction of States:t It is easy to see that 
most of the state theory developed in part 1 of the book for SSMs goes over 
to PAs after some small changes are introduced in the basic definitions and 


{This section assumes knowledge of Chapter I of the book. 
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statements [e.g, the first column in H^ will be y” and not y, etc.] Thus one 
can define reduced and minimal PAs, covering of PAs' accessible states, con- 
nected PAs, equivalent distribution for PAs etc, and prove practically all of 
the theorems proved for SSMs with regard to these notions. In addition a new 
notion of equivalence can be introduced for PAs. Consider the following two 
definitions: 


Definition 3.1: Two distributions z and p are equivalent for a PA 4A if 
zü'(x) = py*(x) for all x e X*. 

Definition 3.2: Two distributions z and p are A-equivalent of order k for a PA 
A if zuf'(x) > À <= py? (x) >A for all x e E* with I(x) Kk. x and p are A-equi- 
valent if the relation above is true for all x € Z*. 

Two distributions which are equivalent are ipso facto A-equivalent [of order 
k] in other words equivalence is a [proper] refinement of A-equivalence. It 
follows therefore from Theorem 3.2 that both types of equivalence may be of 
infinite index. 

There are gedanken experiments for deciding whether two distributions z 
and p are equivalent [see Theorem B, 2.1 Chapter I for SSMs]. This is however 
not true for PAs as the following theorem shows. 


Theorem 3.5: There is a PA and a number A such that for any integer k there 
are at least two A-equivalent distributions of order k which are not A-equivalent. 


Proof: Let E be the event over È = (0, 1, E = {x = 0, +++ Ox: 0, +++ 0, 
> A, A = .10100100001 - - -}, i.e., the binary expansion of A consists of all the 
terms of the form 0*1, k = 0, 1,..., ordered according to the magnitude of 
k. Then E is representable in a PA as in Theorem 3.2 and therefore E is also 
a [nonregular, since A is irrational] PCE. Consider the two words 

x, = 101 --- 100--- OI and yy = 101 --- 10--- 010 
—— —— 
As proved in Theorem 3.2, these two words are not equivalent, but one sees 
easily that the shortest z such that either x,z € E and y,z € E [or vice versa] 
is 
z=0.---0l 


— 


k 

Thus x, and y, are A-equivalent of order k and this proves our theorems. fj 

We come now to the problem of merging of states. If two degenerate dis- 
tributions [or states] are equivalent for a PA A, then the two states can be 
merged to get a new equivalent PA with fewer states [see Theorem B, 2.4 in 
Chapter I]. Is there any parallel procedure for A-equivalence? In other words, if 
by some means we would be able to find out that two degenerate distributions 
are A-equivalent [as mentioned before this question is not decidable by gedan- 
ken experiments], would this enable us to get another PA with fewer states 
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which is A-equivalent to the original one [meaning that for any initial distri- 
bution of the original PA there is a A-equivalent distribution for the second 
PA and vice versa]? The answer to this problem is negative in general and is 
explained in the following argument. 

Consider again the set of vectors Z^ = [z(x): x e X*} considered now as 
points in n-dimensional space. These points are included in the n-dimensional 
simplex A(n) = (z : z is an n-dimensional probabilistic vector.} The hyperplane 
containing all the points z(x)y’ = A divides Z^ into two subsets 7.4 = (n(x): 
nxn > Aj, P4 = (n(x) : n(x) <À} so that x(x) € Z,^ if and only if 
x € T(A,A). The merging of two extremal points in A(n) means geometrically 
a projection, along the line connecting those two points, of the n-dimensional 
simplex A(n) into the (n — 1)-dimensional simplex Z(n — 1). Unless the line 
connecting the two merging points is parallel to the hyperplane (z : x yn” = A} 
it may happen that a point in A,4 will have its projection in the set P_4 of 
the (n — 1)-dimensional space. A situation like that in Figure 17 may occur 
where both words x, and x, are accepted, but if the states 5, and 5; are merged 
then the resulting automaton 4’ will accept x, and reject x. 


T (X) 


Figure 17. Geometricol interpretation of merging of states for 
PAs with cutpoint. 


We conclude this section by an example showing that PAs over an alphabet 
X containing a single letter may still induce a nonregular PCE. 
Theorem 3.6: There exists a 3-state PA A over an alphabet X = {ø} containing 
a single letter and a cut-point A such that T(A, A) is a nonregular event. 


Proof: Consider the PA defined as follows: S = {s,s,5,}, x = (00 1), n? = 
(00 1)" 


2 0 1 
3 3 
Ao) =$ 4 4 with A= 4 


e= 
aj 
je 
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The eigenvalues of A(c) are 1, (4 + i45 4/7), (3 — iq 4/7), each having 
multiplicity 1. [The reader is urged to verify the computations.] Determining 
the corresponding row and column eigenvectors and using formula (4) in 
Section II, A, 5, we find that 


a(o) = i + uw" + av" (1) 


wheret 


and i, are the conjugates of u and v respectively. Writing formula (1) in a 
trigonometric form we get 


dam) — 4 = ep" sin (n9 + y) Q) 


where c = |u|, p = |v], y = arg(u), and 0 = arg(v). Thus, if and only if 
—n/2 < mÜ +a < 2/2, then a? (c) > 4/11 or o" € T(A, A). We shall need 
here 


Lemma 3.7: If 0 is rational in degrees [ie., @ = 2zr where r is a rational 
number], then the only rational values of cos Ó are 0, +3, +1. If ô is irrational 
in degrees, then any subinterval of (0, 27) contains values of the form m6 
(mod 2z). 


The proof of this lemma, involving algebraic number theory, is omitted here 
and can be found in Niven (1956). 

Checking our 0 for the condition of the above lemma we find that cos 8 
= Re A/|A| = $ which implies that @ is irrational in degrees. Let o™ and o"* 
be two words in T(A, A) such that m,0 + y = à, m0 + y = f, —n/2 «a, 
B < n/2 and assume that & < $. Then there is m, such that 2/2 < y + (m 
+ m,)0<2/2+(B — 4)/2 by the second statement of Lemma 3.7. It 
follows that y + (m, + m, 0 < 2/2 — (B —a)/2. Thuso™*™ ¢ T(A, 4) 
while o™*™ € T(A, A). We have proved that any two words in T(A, A) [this 
set is infinite by the second statement in Lemma 3.7] are nonequivalent accord- 
ing to Nerode's equivalence [Theorem 2.1] and Nerode's equivalence is there- 
fore of infinite index. This completes the proof. 


Remarks 


1. Note that the cut-point 2 used in Theorem 3.6 is a rational number. 
Thus the regularity or irregularity of events of the form T( A4, À) is not connected 
to the rationality or irrationality of 4 as one may guess from Theorem 3.2. 


1We use here the notation v for eigenvalues in order to avoid confusion with the cut- 
point notation 4. 
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2. Theorem 3.6 provides an example of a PA over a single letter alphabet 
inducing a nonregular event. This is, however, not true in general; in other 
words there are many cases where such a PA defines a regular event. See Ex- 
ercises 10-13 after this section for more details on PA over a single letter 
alphabet. 


3. The example in Theorem 3.6 also provides an explicit case of a non- 
context-free event representable in a PA. This follows from the fact that:all 
context-free events over a one letter alphabet are regular and the event in 
Theorem 3.6 is not regular and therefore not context free. 


EXERCISES 


1. An “m-adic two state PA" is a 2-state PA A = (S, n, {A(o)}, 1?) over the 
alphabet X = (0, 1,..., m — 1} where 


m-—i d 

. m m 
A(i) — m—i—1 3x1 i=0,1,...,m—1 

m m 

0 

z = (1 0); SM 


Prove that if x — 0, --: 0, € E*, then px) = .9,--- 0,, this being an 
ordinary m-adic fraction. 

2. Prove that if the symbol 1 is removed from the alphabet of the 3-adic PA, 
then the set of values {p(x):x € X*, X = (0, 2}} is a nowhere dense set 
[Cantor's discotinuum]. 


3. Prove that Theorem 3.2 is true for m-adic automata. 


4. A number u is called accessible by a PA A if there is a word x € £* such 
that p“(x) = u. Prove that if A is a rational number which is not accessible 
for the PA A in Exercise 2, then T(A, À) is a nonregular set for that A. 


5. Let A be the 3-state PA over E = (0, 1,..., m — 1} such that z = (100) 


0 1 m—i—li i 

i ' m m m 
nF = |0 and Ai) = 0 1 0 
! 0 0 I 


Prove that T(A, A) is the event (x = 9, --- 04:.01 +++ Op > Aj. 


6. Let $ be a real valued function over £* such that $(e) = 0 [e is the empty 
word] and for all x, c € X*, 
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(ox) = a(a)p(x) + blo) 
where a(a) + b(c) < 1. Prove that any event of the form {x : d(x) > å} can 
be represented by a 3-state PA with cut-point À. 


7. Same as Exercise 6, but ó(xo) = a(a)¢(x) + b(a) and the PA has 2 states. 


8. Let y bea a mapping from symbols in X to words in X* and extend y to 
=* by the requirements 


y(e) -e, — w(xo) = v(xyv(o) 
Let X = (0, 1,..., m — 1} and denote by .w(x) the m-adic expansion where 


the symbols in y(x) are considered as digits [.yw(e) = 0]. Prove that the event 
[x :.w(x) > A} can be represented in a 3-state automata with cut-point A. 


9. Prove that if in Exercise 8 the function y has the property that y(i) = x, 
with /(x;) = k for a fixed number k > 1 and all i € X, then the event {x : .y(x) 
> A} is not regular if and only if A is an irrational number of the form 
n = (o): vo.) >>. 

10. Prove that a 3-state PA A = (S, 2, A(o), 9”) over a single letter alphabet 
defines an irregular event if and only if (1) z4(o) + 2, (2) A(c) has an imagi- 
nary eigenvalue with argument irrational in degrees, and (3) the cut-point 4 is 
equal to lim, aj? [which “lim” always exists if (2) is satisfied and is in- 
dependent on i] if F ={s,}, and is equal to lim,.. af? + lim,... a” if 
F-1:,5,. 

11. Prove that the number of nonregular events of the form T(A, A) where A 
is a given n-state PA over a single letter alphabet is < n. 


12. Prove the following theorem: 


Theorem: Let A be a PA over a single letter alphabet. Let v,,. . ., v, be the 
eigenvalues of A(o) such that |v,] = --- = |v,] = 1 and let v,,---, v,,, be 
the eigenvalues of A with maximum absolute value such that |v,| = --- = 
Vora] < 1 and such that $72*? vZ@,,,(m) #0 for all m — mo, where m, is 
some integer and @,,, is as in formula (4) in Section II, A, 5 [A, in that formula 
is replaced by v, here]. If arg v,, arg v,.,, :- +, arg v,,, are all rational in 
degree then 7(A, A) is a regular event for any 4. 

13. Prove the following corollary to the theorem in Exercise 12: If a PA A as 
in Exercise 12 has all its eigenvalues with rational arguments then T(A, À) is 
regular for any 4. 

14. Prove that if a PA A over a one letter alphabet has only real eigenvalues, 
then T(A, A) is regular for any À. 

15. Prove that any 2-states PA A over a one letter alphabet defines a regular 
event T(A, A). 
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OPEN PROBLEMS 


1. Find a decision procedure for checking whether any two given distributions 
for a PA are A-equivalent for a given A or prove that the problem is not decid- 
able. 


2. Provide a procedure with the aid of which one will be able to find a PA B 
with a minimal number of states such that the event T(B, A) for some À equals 
a given event T(A, 2). 


4. Particular Cases 


a. Exclusive PCEs 


The class of events to be dealt with in this section is properly included in 
the class of PCEs and properly includes regular events. In addition they have 
most of the closure properties regular events have. 


Definition 4.1: An event of the form (x : x € X*, p(x) = A} where A isa PA 
and À is a real number 0 < A < 1 is an exclusive PCE and is denoted by 
the notation T..(A, A). 


Proposition 4.1: The regular events are properly included in the class of ex- 
clusive PCEs. 


Proof: It is clear that any regular event can be represented in the form 
T.(A,0) where A is a deterministic automaton [see the proof of Proposition 
1.2]. On the other hand, for the PA given in Exercise 1.4, we have that 
T.(A, 1) is the complement of the event E = (x:x € E*, x = 0,"*'0,0,", 
m > 0} which is not a regular event. Regular events being closed under com- 
plimentation we have that T,(A, 3) is not regular and this completes the 
proof. 


Proposition 4.2: The class of exclusive PCEs is properly included in the class 
of PCEs. 


Proof: Let T,(A, À) be an exclusive PCE. By Proposition A, 2.1 there is an 
SPE p? such that for all words x € XZ*, p'(x) = p(x) — À so that T,(A, 4) 
= T,(B, 0). Let C be the pseudo probabilistic automaton C = B (3) B (see 
definition in the proof of Proposition A,1.3). As in the proof of Proposition 
A,1.3, p°(x) = pP(x)p*(x) for all x e Z*. Thus T,(B, 0) — T(C, 0). Now using 
Proposition 1.1 we have that there is a PA D and a cut-point x such that T(C, 0) 
= T(D, u). This proves that any exclusive PCE is a PCE, since T(D, ji) 
= T,(A, A). To prove that inclusion is proper, let A be the PA defined in 
in Exercise 1.4. Then T(A, 4) = (0,70,0," : m < n} so that the event E = 
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{o,"0,0,": < n} is a PCE. We will show that this event is, however, not an 
exclusive PCE. Assume the contrary, then there is a PA B such that E — 
T,(B, 1) [by Proposition 1.4 4 may always be assumed to be equal to 4]. Using 
Theorem 2.8 in Section II, C with u = ø, we have that there is a sequence of 
numbers Co, €,,...,c,., such that 97-4 c; = 1 and for the words w = g,"c; 
and u” = e [the empty word] the following equality holds true: 


p*(a,"0,0,") = c, ,p*(0,0,0, ) + --- + e,p'(0,0,0)) + csp'(o,"0;) 
But the words ¢,"0,0,""', . . . , 0,0, are not in E and therefore their probability 
by B is 4. This implies that p(o,"0,0,") = 4 Y 7-dc; = 4 which contradicts 
the fact that p'(o,^0,0,") Æ 1sincec;"0,0; € E. The proof is complete. J 
Theorem 4.3: The class of exclusive PCEs is closed under union. 


Proof: Let T,(A, A) and T,(B, u) be two exclusive PCEs. As in the proof 
of Proposition 4.2, there are pseudo probabilistic automata C and D such that 
T.(A, A) = T(C, 0) and T,(B, u) = T(D, 0) with p°(x) > 0, p?(x) > 0 for all 
x € E*. Then T(C + D, 0) = T(C, 0) U T(D, 0) where C + D is the auto- 
maton defining the function p°*?(x) = p(x) + p?(x) [see Proposition A,2.1]. 
By Proposition 1.1 there is a PA A’ such that T(4’, v) = T(C + D, 0) = 
T.(A, A) U T,(B, u) for some cut-point v and, as follows from the proof of 
that theorem, P^4(x) > v for all x e X* so that T(A', v) = T.,(A', v) as re- 
quired. i 
Theorem 4.4: The class of PCEs is closed under intersection with exclusive 
PCEs. 


Proof: Let T(A, à) and T.(B, 4) be a PCE and an exclusive PCE respec- 
tively. Then there are automata A’ and B' such that T(A, A) = T(A', 0) and 
T,(B, u) = T,(B', 0). It follows that T(AG9) B' Q B', 0) = T(A, A) O T«(B, p). 
By Proposition 1.1 there is a PA C and a cut-point v such that T(4 &) B' 
G9 B', 0) = T(C, v) and this completes the proof. | 
Proposition 4.5: The class of exclusive PCEs is closed under intersection. 

Proof: One can assume that T,(A, A) = T,(4’, 0), and similarly T.(B, 4) 
= T,(B', 0) so that T.(4' 69 B’, 0) = TA, A) ' TAB, Ww. | 


EXERCISES 

1. Let A be the PA in Proposition A,1.10. Prove that the event T(A, 4) is not 
an exclusive PCE. 

2. Prove that the event (0" 1": n = 1, 2. . .] is not an exclusive PCE. 


3. Define the event T.(4, 2) = (x : x € E*, p'(x) = A} for a given PA and a 
cut-point A. Prove that the class of events of the form T..(A, A) is closed under 
union and intersection and includes the regular events. 
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4. Prove that the event (01^: n > 1} can be written in the form T..(A, A) 
where A is a PA with cut-point À. 


OPEN PROBLEM 


Are the events T_(A, A) included in the class of PCE and, if yes, is the in- 
clusion proper? 


b. Definite PCEs 


Definition 4.2: A PA A is weakly k-definite if and only if for any x € Z* with 
K(x) > k and any initial distributions z and pt we have that p, (x) = p,“(x). 
Proposition 4.6: If A is a weakly k-definite PA and z = yx is a word in Z* 
with I(x) > k, then p4(z) = p'(x). 

Proof: p,^(z) = RAJAN? = n()A(x)m" = Pro) = p.) [p (x) is in- 
dependent on the initial distribution since /(x) > k]. 

Corollary 4.7: If A is a weakly definite PA, then T(A, å) is a definite PCE for 
any cut-point. 

Proof: It follows from Proposition 4.2 that the set of values p“(x) is finite 
[it is smaller than or equal to the different values in the set {p“(x) : (x) < kj] 
and for any z = yx, I(x) > k and any A, pz) = p“(x) so that p“(z) > 4 if and 
only if p(x) > 4. | 
Proposition 4.8: A PA A is weakly k-definite if and only if for any word 
x € X* with (x) > k the vectors n(x) and p(x) are equivalent for any initial 
distribution zx and p. 

Proof: If n(x) is equivalent to p(x), then p(x) = z(x)r* = p(x)n” = p,(x) 
by Definition 3.1. Conversely, if p,(x) = p,(x) for all x € Z* with I(x) > k 
then, for any y € E* and x € X* with I(x) > k we have that p,(xy) = p,(xy). 
But p,(xy) = n(x)n'(y) and p,(xy) = p(x)n"(y). Thus for all y € Z*, m(x)y"(y) 
= p(xy'(y) which implies that z(x) and p(x) are equivalent vectors. i 
Corollary 4.9: If A is a weakly k-definite PA, then for any x € E* with 
I(x) > k the rows of the matrix A(x) considered as distributions are equivalent 
one to the other. 


Proof: Any row in a matrix A(x) can be written in the form z(x) where z 
is a degenerate stochastic vector. || 


Remark: The converse of Corollary 4.9 is also true and is left as an exercise. 


For the purpose of this definition it is assumed that A does not have an a priori fixed 
initial distribution. 
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Definition 4.2: A PA is k-definite if it is weakly k-definite, but is not weakly 
(k — 1) definite. It is definite if it is k-definite for some k > 0. 


It follows from the above definition and from Proposition 4.8 that if and 
only if a PA is k-definite, then any two vectors z(x) and p(x) with I(x) > k 
are equivalent, but there are two nonequivalent vectors z(x) and p(x) with 
Kx) — k — t. 

As in the case of SSMs one can define, for a given PA A, a matrix H^ such 
that its columns are a basis for all column vectors of the form (x), x e Z* 
[see Section I,B,1). The first two columns of H^ will be y” and g^ [F = S — F]. 
The procedure for constructing H^ will be exactly the same as that used for 
constructing H and the rank of a PA A will be defined as rank A = rank H^. 
[This rank is always less or equal than the number of states of A.J] As the 
columns of H^ are a basis for the set of vectors 4°(x) we have that two vectors 
z and p are equivalent initial distributions for A if and only if (z — p)H4 = 0. 
We are now able to prove the following: 


Theorem 4.10: Let A bea PA. If A is k-definite, then k < rank A — 1. 


Proof: The proof is almost the same as the proof of Theorem 4.11 in Section 
ILA. As in that proof, we define the set of matrices K’ = (A(x) : Kx) = i] [to 
avoid ambignity we use here the notation K' instead of H' there] and the linear 
spaces V = {5 = (v) : X v: = 0}, 

VK = » &AXy &eV, AX) EK, r=1,2,..3 
so that all the statements (a)-(d) in the proof of Theorem 4.11 in Section IL,A 
are still true. As for statement (e), we change it to the following statement (e): 
If the PA is k-definite, then the space VH* is the nullspace of H^ [i.e., 
dim VH* = n — dim A, where n is the number of states of A]. 

To prove this statement, assume it is not true. Then there is a vector of the 
form ? = Y; 6,A(x,) such that 6, € V, x, € E*, (x) = k and 9H^ z- 0. This 
implies that at least one of the summands 0,A(x,) has this property, i.e., there 
is a 3, € V and a matrix A(x;) such that I(x, = k and ù; 4(x))H^ = 0. Let 
6, = (vj) with $7.,v,, = 0, then setting 37.,vjj = — -vg —c [c0 
necessarily] we define the two distributions z = (z,) with z; = v/;/c and p = (pj) 
with p, = |v;|/c. It follows that (1/c)(z — p)A(x)H^ = v,A(x)H^ #0 or 
(x(x) — p(x))H4+0 this implying that z(x) is not equivalent to p(x) although 
I(x) = k, and this is a contradiction. 

Continuing the same way as in the end of the proof of Theorem 4.11 in 
Section II,A we have a sequence of decreasing numbers 


n — | = dim VK? > dim VK! > .-- dim VK* = n — dim H^ 
Hence, k < dim H^ — 1. I 


Corollary 4.11: If the rank of a k-definite PA A equals the number of its 
states, then k < n — 1 and the matrices A(x) corresponding to words x with 
I(x) => k are all constant matrices. 
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Procf: The first statement of the corollary is evident. As for the second 
statement, any two rows in a matrix A(x) with /(x) > k considered as distri- 
butions are equivalent, but no two different distributions z and p can satisfy 
the equation (x — p)H^ = 0 if rank H^ = n [i.e., H^ isa nonsingular matrix]. 


EXERCISES 


1. Prove that if the rows in any matrix A(x), I(x) > k of a given PA A are 
equivalent one to the other, then A is a weakly k-definite PA. 

2. Let A bea k-definite PA. Prove that there are two distinct distributions for 
A which are j-equivalent [two distributions z and p are j equivalent for a PA 
A if px) = p, (x) for all x with I(x) < j] where j = 1,2,...,k — 1. 

3. Let A be a definite PA, then the matrices A(c) are all singular. 

4. Prove that if the set of matrices (/4(u):u € X*] in Exercise 1.7 is a k- 


definite set, then the event T(M, 4, y) is definite. Find the order of definiteness 
of T(M, A, y) in this case. 


c. Quasidefinite PCEs 


Definition 4.3: A PA A is quasidefinite if and only if for any € > 0 there is a 
number k(€) such that for all x € Z* with /(x) > k(e) and any two initial 
distributions for A, z, and pt, |p,(x) — p,(x)| < €. 

Proposition 4.12: If A is a quasidefinite PA, then for any € > 0 there is a 
number k(€) such that for all x e £* with I(x)  k(e) and any y € X* we 
have that |p,“(yx) — pz(x) | < e. 

Proof: p, yx) = p.oK(x) and |p./(x) — p.*(x)| < € by definition. — qj 
Proposition 4.13: A sufficient condition for a PA A to be quasidefinite is: For 
any € > 0, there is a number k(€) such that for all x with I(x) > k(€) and any 
two initial distributions z and p for A, ||z(x) — p(x)]| e. [If x = (x) isa 
vector then [Ix|| = Y; |z,|.] 

Proof: If ||n(x) — p(x)|| x €, then 

|p.(x) — p,G)] = InGog* — poon"| 
= a — pCOYrr'| x ll — poll x e 
since the entries in y” are either O or 1. || 
Proposition 4.14: If and only if the condition specified in Proposition 4.13 


holds for a PA A, then the corresponding syster (S, {A(o)}) is a weakly ergodic 
Markov system [see Definition 3.2 in Section ILA]. 


TFor the purpose of this definition it is assumed that A does not have an a priori fixed 
initial distribution. 


176 Chapter III. Events, Languages, and Acceptors 


Proof: It follows from Proposition A, 1.4 in Chapter II that for any fixed 
word x e £*, 


lze) — poll = lli — DAI x lx — PAA) < 20402) 


since ||z — p|| < 2 for any two stochastic vectors z and p. On the other hand 
there are indices i, and i, such that ó(A(x)) = Y,(a, (x) — a,(x))* and if z 
and p are the degenerate vectors having a 1 in the i, and i; entries respectively, 
then ||(z — p)A(x)|| = Dylan (X) — arn) = 2 Elan (x) — a,,0)0)* = 26(AQ)) 
so that, for the specific vectors p and z as above, ||z(x) — p(x)|| = 26(A(x)). 
It follows that lim;,,...|Ix(x) — p(x)|| = 0 not depending on the choice of z 
and p if and only if lim,,)...6(A(x)) = 0. | 


Remark 1: It is easy to verify that the condition of Proposition 4.13 is not a 
necessary condition for quasidefiniteness; on the other hand the condition is 
decidable, by Proposition 4.14 and Section IL,A Corollary 4.6 and Theorem 
4.7. More precisely we have the following: 


Theorem 4.15: Let A bea PA. If for every € there is a number k = k(€) such 
that ||z(x) — p(x)|| < € for any distributions x and p and any x € X* such 
that /(x) > K(e), then the system (S, (4(c)]) satisfies the condition H, of some 
order less or equal to 1(3" — 2^*! + 1) where |S] = n [see Definition 4.4, Sec- 
tion ILAJ. 


Remark 2: It is easily verified that the PA used in the proof of Theorem 
3.1 is quasidefinite. This shows that the class of PCE which can be defined by 
quasidefinite PA is nondenumerable. The concept of quasidefiniteness is thus 
a proper generalization of the concept of definiteness. 

We shall consider now quasidefinite PAs with isolated cut-point. 


Theorem 4.16: If A is a quasidefinite PA and 4 is an isolated cut-point for it, 
then 7(A, A) is a definite [regular] event. 


Proof: A being an isolated cut-point, there is € such that |p4(x) — À| > € 
for all x € X* and some € > 0. Since A is quasidefinite, there is a number 
k = k(e/2) for the above € — 0 such that |p(yx) — p(x)| < €/2 for all x € X* 
with /(x) > k(€/2) and all y € £* [see Proposition 4.12]. We have therefore, 
for all x with /(x) > k(€/2) and all y € X*, that p(yx) >A if and only if 
P(x) > A. It follows that T(A, A) = U, U U, where U, = {x: (x) < k(e/2), 
p(x) > Aj and U,={x:x = yz, Kz) > k(e/2), p(z) > A}. But U, is a finite set 
and therefore definite, and U, can be written in the form U, = X*V where V 
is the finite set V = {z : I(z) = k(€/2), p(z) > A}. Thus U, is definite and, since 
definite events are closed under union, we have that also T(A, A) is definite. $ 

Remark: It is worth mentioning that the conditions of Theorem 4.16 may 
serve as a characterization of definite events, for the following converse of that 
theorem is also true. 
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Theorem 4.17: Any definite event E can be represented in the form E = T(A, 4) 
where A is a quasidefinite PA and A is an isolated cut-point for A. 


Proof: Given the k-definite event E = Z* U U V with U and V finite and 
length of all words in U equal to k [any definite event can be written in this 
form) we define the PA A = (S, z, {A(o)}, F] over the alphabet X as follows: 
Let €' = E U b,b € F, then 

k 


————— 
S —((b...,0,...,0):0:, € Z, Ox rz k| 


i.e., the states are k-tuples of symbols in €' and only k-tuples of the above form 
are in S. If |Z| = m, then |S| = 1 +m +m + --- + m* = q and there isa 
1-1 correspondence between the states in S in the words in Z* with length 
<k. The initial distribution z is the degenerate distribution having a 1 in the 
entry corresponding to the state (b,..., 5). The set of final states F contains 
all the states corresponding to words in U U V. Finally, the transition matrices 
are defined as follows. If i is the state (T, . .. , Tą), then 


1—e if j—(t,...,T,0) 
alo) = TEL otherwise 

1—q 
If x e X* isa word x = 0, +- Onr < k and i is the state i = (b,..., b), 
then a,(0, -+ 0,) — (1 — ey for j= (b...,5,0,...,0,) If x eE* isa 
word x = 9,:-- O and i is any state then, a,(o,--- O) = (1 — €) for 
j — (0,,...,0,). If x, y € E* are words such that X —0,-:-: 0,, then for 
j — (05... , 04), and for any state i, 


a2) = È a)a49) = a9) $ ady) = (1 — 91-0 — e) 


t=1 

It follows that if x € E then p(x) >(1 — €)* [if r< k, then (1 — ey 
> (1 — €)*] and we may choose € so small as to have the value (1 — €)* as 
close to 1 as wanted [the number k is given a priori and depends on E only]. 
Let € be such that (1 — €)* > $ and let A = 4. For any word x, if x ¢ E, 
then p4(x) <4 [the machine will enter a state not in F with probability at least 
3 in this case] so that A is isolated. Moreover, the above considerations show 
that E = T(A, 4). Finally, all the entries in the matrices A(co) are positive and 
this implies that the set (S, A(c)) is a quasidefinite set [the H, condition of 
order | is satisfied in this case]. | 


EXERCISES 


1. Prove by an example that the condition of the Proposition 4.13 is not neces- 
sary for quasidefiniteness. 


2. Provide a full proof for Theorem 4.15. 
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3. Let À be a vector all the entries of which are equal to 2, 0 < A<1, and 
let A be a PA with number of states equal to the dimension of À. À is an 
isolated vector for A if there is a number ó > 0 such that 


(AE — Ay AGO" — a) > & 
for all x € E* where the product is the ordinary scalar product of vectors. 


Prove: If A is an isolated vector for a quasidefinite PA A, then the event 
T(A, À) is a definite [regular] event. 


4. Prove that if A is a two state PA such that no matrix A(c) equals the matrix 
[ ?] or [? 3], then A is a quasidefinite PA. 


5. Prove that if A is a PA such that all the entries in all the matrices A(c) are 
positive, then A is a quasidefinite PA. 


5. Approximations 


We know already that the cardinality of PCEs [over the real numbers] equals 
the cardinality of the continuum [Theorem 3.2] and therefore there must be 
PCEs which are not definable by any type of deterministic automaton [all 
deterministic machines, including Turing machines, are denumerable]. On the 
other hand we know also that if the cut-point A is isolated, then the resulting 
PCE is regular [Theorem 2.3]. This raises the suspicion that probabilistic 
automata may reduce to deterministic automata when compared in a weaker 
form, allowing for approximations in the vicinity of the cut-point. To make 
this notion explicit we introduce the following: 


Definition 5.1: Let A be a PA inducing the PE f over Z* and let B be any 
finite state machine [Turing machine, linear bounded, etc]. B e-approximates 
A if there is a function $ with domain B(so, x) [B(s,, x) denoting the configu- 
ration of B after the word x has been scanned from the initial state sọ] and real 
values such that 


| fix) — $(BGo, x))| < € 
Definition 5.2: An event E [understood here as a subset of £*] €-approximates 
a PCE T(A, A) if 
(E — T(4,) U (E — T(4,4) S fx: x e Z*,]fo) A x g 

where f is the event [here understood as a function] induced by A. 

It is easy to prove that, in the above sense, PAs [the matrices and vectors 
defining A have real entries] are approximable by Turing machines, this being 
a consequence of the fact that Turing machines can "compute" within any 


preassigned € the values of a function f(x) induced by a PA. The above de- 
finitions will therefore enable us to compare the nondenumerable set of PAs 
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[or cut-point events defined by them] with denumerable sets [e.g., Turing 
machine and events defined by them]. Some particular cases will be considered 
in the following subsections. 


a. €-Approximation by Finite Automata 


Definition 5.3: Given a PE f and € > 0, an e-cover induced by f is a finite 
set {C,} where the C, are sets of points in the interval [0, 1] satisfying the 
following requirements: 

L Uko €, = (6:0) = č, x e Et. 

2. 6,6 = C, > lg, — &l «e,i—0, Lk 

3. For any i and z, there is j such that C;z € C, where C,z is defined as the 
set Ciz = (6: f(xz) = §, f(x) e Cj. 
Theorem 5.1: Given a PE f and e > 0, f is €-approximable by a finite auto- 
maton B if and only if there exists a 2€ cover induced by f. 


Proof: Let (Cj, be an €-cover for f. Define the deterministic automaton 
B as follows. The states of B are Cp, . . . , C,. Let C, be the first set such that 
f(e) € Co, then the initial state of B is Co. The transition function of B is 
defined by the relation 

B(C, 0) = C; if Cio € C, 
and j is the smallest index satisfying the relation. Finally, set $(Cj) = 
l[sup,..c,€ + inficc $]. We prove first, by induction, that for any x € Z*, 
f(x) € B(s, x): 

i For x — e, the statement follows from the definition of B. 

ii Let x be a word with /(x) = t and assume that f(x) € B(s, x) = C, 
Then f(xo) € C,o = B(so, xa) by the definitions of C,c and B, and this proves 
the statement. || 


We have, therefore, that for any x e €* 


| f(x) — $(BGso, x))| = |S) — &[sup;. c, 6 + infj c, e] x € 
by the fact that f(x) € B(s,, x) = C; and the second property of the € cover. 
Assume now that f is €-approximable by a deterministic automaton B with 
state set S = (s, ..., s,]. Define the sets 


C, = Č: f(x) — 6  B(sx)-—sj 
It is easily verified that the set C; thus defined is a 2e cover as required. 
Definition 5.4: Given a PCE E = T(A, å), an e-cover induced by the auto- 
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maton A with cut-point 4 is a finite set {C}, where the C, are sets of points 
in the interval [0, 1] satisfying the following requirements: 


1. UEC, = K: f(x) = 5 x e Et. 

2. Either C, c E: >A— Gore, s {[E:€E<A+ 4. 

3. For any i and z, there is j such that C,z € C, where C,z is defined as in 
Definition 5.3. 

Theorem 5.2: A PCE E = T(A, 4) is €-approximable by a regular event 
E' = T(B) if and only if there exists an €-cover induced by the automaton 4 
with cut-point 4. 

Proof: The proof is similar to the proof of Theorem 5.1. The final states 

of B will be the C, satisfying the relation 

CGc(t:£-4-68g fj 
Proposition 5.3: Let f be a PE. If f is €-approximable by some finite auto- 
maton B, then for any A, the CPE 7(A, A) is e-approximable, where A is the 
PA defining f. 

Proof: The final states of B will be defined to be the states s, such that 
$ls) > A [see Definition 5.1]. For any x € X*, | f(x) — $(B(se, x)| < € and if 
$(B(So, x)) > A meaning that x € T(B), then f(x) > A — e. If ((B(sy, x)) <À 
meaning that x ¢ T(B), then f(x) < À + e. | 
Proposition 5.4: Let f be a PE defined by a PA A such that for any À the CPE 
T(A, A) is €-approximable by an event T(B,) where B, is a finite [deterministic] 
automaton, then there exists a finite automaton B which €-approximates A. 

Proof: Divide the interval [0, 1] into k equal parts by k — 1 points A,,..., 
Aj-i[Ao = 0, a, = 1] such that À; — À < €, i = 1, 2,..., k, and let B,, 
i—0,1,...,k — 1 be the corresponding c-approximating automaton for 
T(A, À4). Define the machine B as follows. B = (S, So M) [F is immaterial 
here] with 

S = ((su(A9) S.A . 5,0) s(A) € Si} 

So = (Soho), . «+» SoCAn—1)) 

M(s;,(Ao); Si(Ar); ERE? Su.) 0) 
= (Mis, 0), .... Ma, (Si 0)) 
with B; = (Sas s(4)), Mi, Fa) and S; = {s(A,)}. Set 
ds) = DSa (A9. ...,5,(4-)) = max (A 5,.(A;) € Fi} 

Thus, $(B(s,, x)) = A, implies that x € T(B,,,) and x € T(B,,,) which im- 
plies that f(x) >A, — € and f(x) X Aj, +€<4,+ 3e. It follows that 
lé(Bs, x)) —feo| 2e. M 
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Remark: Any PE f induced by a PA A can be transformed into a PCE 
T(A, A) for a cut-point A. It follows from the above two propositions that f is 
€-approximable if and only if the derived PCEs T(A, A) are €-approximable 
for any cut-point 4. 


b. A Counterexample 


Consider the following PA A = (x, S, (4(c)], ^) over X = (0, 1} with S = 
(50515254, x = (1 0 0 0) and 


1 4 040 4400 
l H 1 1 
r= M1051 0 “li ooo 
0 000 ! 0100 
By straightforward computations one can prove the following relations: 
= (4)" if x 20,n-50,12,..., [0? is the empty word.] 
=} if x=0"10",1,..., 01, n, > 0,j=1,2,...,k 
x) and there is i with n, = 0 
>4 if x =0"10"1,...,0™1, n,>0,j=1,2,...,k 
«d if x =0"10"1,...,0%10", n,>0,j=1,2,...,k 
nya > 0 


where p(x) is the (1, 1) entry in A(x). 

Consider now the PE defined by. A, p^, and let T(A, A) be the PCE, with 

— 1 

Lk 

We have that 

T((p^, A) = 1x: Px) > 1 

It follows from the above inequalities that T((A, 4)) for A = 4 is the set of 
words x such that x is empty or x begins with a zero, ends with a one, and 
contains no subword of two or more consecutive ones. It is easily verified that 
this set of words is a regular set [there exists a finite automaton accepting it] 
and therefore it is €-approximable [even for € = 0] by a finite automaton. 

We shall show now that there is a A such that 7(A, A) is not €-approximable 
by a finite automaton with the result that the function p^ is not approximable 
either, this following from Praposition 5.3. 

Let x," be the word x," = (0"1)". One can prove, again using straight for- 
ward computation, that 


p'(x,") = 1 + Ul 7 Gyr 
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Thus lim, ..p*(x,") = ! for fixed m > 0, while lim, ... pf(x,") = 4 for fixed 
n> 0. Now let À be a real number 4 < 4 < 1, say A = 3, and let € be a real 
number € < + and suppose that T(A, A) is €-approximable for the given 4 and 
€. Let the approximating machine have k states. Choose n, so great that 


px) > A+e for m=1,2,...,k +1 


The first k + 1 applications of the input sequence x,, must send the approxi- 
mating machine B through a sequence of states So, 5,,..., 5,,,, which are all 
final states of B. But B has only k states so that s; = s; forsomei « j<k +1 
so that all the tapes of the form x7, m = 1,2,... will be in 7(B). Thus B 
cannot €-approximate p^ since there is m, with p4(x™) < À — €, i.e., 


[p(xm) —Al € 


while x" c T(B) and x™ ¢ T(A, A). The following are direct consequences 
of the above example: 


1. There is a PCE which is not approximable by a regular event. 


2. There is a PE which is not approximable by a finite [deterministic] auto- 
maton [this follows from Proposition 5.3]. 

3. The PCE given in the above example with cut-point A = 4 is €-approximable 
by a regular event, but the underlying PE, p^ is not e-approximable. The two 
concepts of approximation are not equivalent. 


4. The class of PCEs strictly includes the class of regular events even if com- 
parison is based on €-approximation and not strict equivalence. 


5. There exists a PE f and e such that there is no €-cover induced by it. 
6. There is a PCE, T(A, A) and e such that there is no €-cover induced by it. 


6. Some Nonclosure and Unsolvability Results 


The following notation will be used in this subsection: 

An RPA is an PA such that all the entries in the vectors z and in the 
matrices A(c) are rational numbers. 

An ISA is an SPA such that all the entries in the vectors z, y and in the 
matrices A(o) are integers. 

A P-event is an event E which can be represented in the form E — T(A, 4) 
where A is an RPA and 4 is a rational number. Thus any P-event is a PCE. 

An E-event is an event E which can be represented in the form 


E = {x: f(x) = P) 
where A and B are RPA and f^ and f? are the PEs induced by them. 
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A D-event is an event E which can be represented in the form 
E = {x: f(x) ze f*(x)j 
where A, B, f^ and f? are as above. 


Lemma 6.1: Every E-event can also be represented in the form 


E={x: f(x) =4 
where C is an RPA and every D-event is an exclusive PCE (see Section 4.a) 


Proof: Set fx) =4 f^(x) + à f(x) =4+ f(x) — f*(x)), then use 
the construction in the proofs of Propositions 1.1, 1.2, and 1.5 in Section A 
to show that C can be chosen to be an RPA under the conditions of the 
lemma. | 


Lemma 6.2: The set of P-events is equal to the set of events which can be 
represented in the form 7(A, 4) with A an ISA and 4 an integer. 


Proof: Any event of the form T(A, 4) with A an ISA and 4 an integer is a 
P-event. This follows from the construction involved in the proofs of Proposi- 
tion 1.1 and of the propositions and theorems on which that proposition is 
based. To prove the converse let E = T(A, A) be a given P-event. One proves 
easily (using a construction similar to the one used in the proof of Theorem 
A.2.4) that E can be respresented also in the form E = T(A’, 0) where A' is an 
SPA but the entries in its matrices and vectors are still rational numbers. Let 
m be the absolute value of the smallest common multiple of all the denomi- 
nators of all the entries in all the matrices and vectors of A’ and let A" be the 
SPA. derived from A’ by multiplying all its matrices and vectors by m. A” is 
an ISA by construction and f4(x) > 0 for every x € £*. Thus 7(A, 4) = 
T(A', 0) = T(4", 0) as required. | 


Theorem 6.3: The set of P-events is closed under complementation. 


Proof: Let E = T(A, A) be a P-event. Then, as in the proof of the previous 
lemma, E = T(A', 0) where A’ is an ISA, i.e., the values f(x) are integers for 
every x € E*. Thus E= (x:f*(x) > 0} and E = (x:f4(4) <0} = (x: 
f^ (x) < 1} for the values f“(x) are integers. If A’ = (S, n, (A'(o)], n), let 
A" = (S, n, (4 (c)), —5). A" is an ISA and E = (x: f(x) > 1} which is a 
P-event by Lemma 6.2. | 


Corollary 6.4: The set of E-events is a proper subset of the set of P-events. 


Proof: By Lemma 6.1 every E-event, E, can be represented in the form E = 
[x : f(x) = 4} with 4 an RPA. Thus Ë = (x: fx) 43} = E'. Now E' = 
T,(A, 4) is an exclusive PCE with A an RPA and using the construction used 
in the proof of Proposition 4.2 one proves that E' is a P-event so that, by 
Theorem 6.3, É' — E is also a P-event. That the inclusion is proper follows by 
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an argument similar to the argument used in the proof of Proposition 4.2. This 
part of the proof is left to the reader. || 

Lemma 6.5: Let E, = T(A, A) be a PCE (not necessarily a P-event) and let 
E, = T(B, 0) be a regular event and let c be a symbol c ¢ X. Then E,cE, 
and E,cE, are PCE. 

Proof: Let A = (S, 7, {A(o)}, 1) and let B = (Q, €, {B(o)}, ^), where the 
vector č and the matrices B(c) are degenerate stochastic. Let |S| = m and 
|O] = n. Construct the following PA, C, C = (K, C, (C(o)], 1^) where |K| = 
m+n+1 


Bis) 0 0 
C(c) 2|0 Ala) 0 
0 0 1 
C=( 0 0..-0) g^ — (0--- 0(5^)" 0)? and 
C,(c) 
co=| : 
Cm+nei(€) 


with 
(0---070) ifs, € F, 
(0... 01) otherwise 


It is left for the reader to verify that T(C, 4) = E, c E,. Thus, E, c E, is a 
PCE. In addition 


C(c) — 


E,cE, — Eck, 


which implies by Proposition 1.6 that E, c E, is also a PCE. il 
If E is a set of words, then E* denotes the star closure of E and is defined as 


E*= UE with Eo = fe}, E = E E. 
i=0 
Lemma 6.6: Let X = (a, b). The set of words 
E = (a* b(a*b)* a^ b: x > 0} 
is a PCE. 


Proof: Consider the following SPA A — (S, z, {A(a)}.<x, N) with S={s,,..., 
So}, 2 —($40---0,m =(0--- 01 — 1)" and AGW) = [4,,(0)] with 


a(a) = anha) = asla) = asa) = 4, anla) = a3;(a) = $ 
,,(a) = asa) = $, a,(a) = ag(a) = as(a) = 1, 


a; (a) = 0 in all other cases and 
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a, (b) = a(b) = au (b) = aj(b) = a(b) = as(b) = a(b) = a«(b) = $ 
as(b) = ag (b) = an(b) = ag(b) = as (b) = 1 
a; (b) = 0 in all other cases. 
It is left for the reader to verify that the following relation holds 
E, = (x: fx) = 0) 
= E, U a* b(a *b)*a*b = E, U E, 
with E, a regular event. 

Using an argument similar to the one used in the proof of Corollary 6.4 one 
can easily prove that E, is a P-event and therefore a CPE. By Proposition 1.5 
E, is also a CPE (since E, — E, — E, and E, is regular). y 
Lemma 6.7: Let E, be defined as in Lemma 6.6. The events E, Z* and E7 are 
not CPEs. 

Proof: If an event E = T(A, A) is a CPE, then, by Theorem 2.8 Section II, 


C, for every word x € X* there are constants C,,..., C, such that for any 
y € X* the following equations hold true 


C, f^G^ y) + C, fo! y) + ++ + Of) = 0 e 
C Cae 6-0 (++) 


Assume now that E,E* is a CPE E,X* = T(A, A). Then /“ satisfies the above 
property. Let x = a. For this x some of the coeficients in (*) are positive and 
some are not. Let C,C,,...,C, be the positive coeficients, and let y = 
ba^ ba^ - - - ab», Then f^(a/y) > A if and only if a/ y € E,X* i.e., if and only 
if j has one of the values ij, iz... pixe — 

Multiplying (+*+) by A and substracting from (*) we get 


Cy) — A) + +++ + €f) 2) = 0 (+*+) 


This leads to a contradiction since by the above argument, f4(x/ y) — À > 0 if 
and only if C, > 0 which would imply that the left-hand side of (***) is strict- 
ly positive. Thus E,Z* is not a CPE. The proof that E,* is not a CPE is 
similar, but y = b(a"b)(a"b)* --- (a'*b)* in this case. | 

Definition 6.1: Let £ and A be two alphabets and let V be a mapping V : L—> 
A*. The natural extension of VV of the form V(e) = e and for x = 0, --- c, 
€ E*, P(x) = V(o,) --- V(c,) € A* is called a homomorphism from Z* into 
A*. Given a homomorphism VP : £* — A* and an event Ec XZ*, P(E) is the 
event P(E) = (ye A* : y = V(x), xe E]. (e here is the empty word.) 
Theorem 6.8: The set of CPEs is closed neither under concatenation nor under 
concatenation closure nor under homomorphism. 
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Proof: E, in Lemma 6.6 is a PCE and so is X* (any regular event is a PCE) 
but by Lemma 6.7 E, E* and E,* are not CPEs. This proves the first two 
statements of the theorem. Now, by Lemma 6.5 E,cX* is a PCE. Consider 
the natural extension of the following homomorphism: (a) = a, W(b) = b, 
Wc) = e, Then V(E,cX*) = E, E* which is not a PCE. This completes the 
proof. | 

Remark: It is well known that regular events are closed under the above 
operations. 


Exercises 6.9: Let E be the event consisting of the set of all words a'b a" b a" 
b... a*b such that i, x,,...,, are nonnegative integers and, for some t 
(0 < t<r), i =K, +K k, Prove that E is context-free [see Ginzburg 
(1966)] but is not a CPE (compare with Theorem 1.10). 


Definition 6.2: (Ginzburg, 1966) A generalized sequential machine (GSM) is 
a 6-tuple A = (S, X, A, So M, N) where S, È, A are finite sets (representing the 
states, input, and output symbols, respectively) s, is an element of S (the initial 
state) M isa function M : S x X S (the next state function) and N is a 
function N: S x X — A* (the output function). 

The functions M and N are extended by induction to S x X* by defining 
for every state s every word x € Z* and every o € È 

M(s, e) — s, N(s,e) = e 
M(s, xo) = M(M(s,x,)a) Ns, xo) = Ms, x)N(M(s, x),o) 

The mapping V4: X* — A* defined by ‘P4(x) = N(s, x) where N is the 
output function of a given GSM is called as GSM mapping. 
Theorem 6.10: Let V4 be a GSM-mapping V^: X* — A* and let f” be a PE 
f: A* — [0,1]. The product ¥4 o f? defined as V^ o f(x) = f"(V^(x)) for 
x € E* isa PE, ie. there exists a P A C such that f° = V4of?: X* — [0, 1]. 
If B is an RPA, then C can be chosen to be an RPA. 

Proof: Let A = (S, È, A, s, M, N) and B= (Q, n, (B(Ó)),.,, 1") with 
|S| = m and |Q| = n. Define the PA C = (K, ¢, {C(o)}.<x, 9”) as follows: 

K=SxQ, |K| =m x n, €=(x00--- 0) 
(n- 0m 


5^ is a column vector consisting of n equal m-dimensional subvectors every 
such subvector equal to 4’. For each ø € £C(a) is a square matrix of order 


m x n consisting of n? blocks C,,(¢), p,q = 1,2,...,”, each block C, (0) a 
square matrix of order m defined as follows: 
B(XW^(o)) if M(s,,0) = s, 
C,(0) = i : 
a zero matrix of order m otherwise 


If Pc) = e, then B(¥4(0)) = B(e) = I = the unit matrix of order m. It is 


B. Cut-Point Events 187 


left to the reader to verify that C as defined above is a PA (and if B isan RPA, 
then so is C) and that it satisfies f° = Y4 o f? as required. i 


We are now able to establish some connections between CPEs and a 
certain type of context-free language. Familiarity with formal languages [e.g., 
Ginzburg (1966)] is a prerequisite for the following lemmas and theorems. 


Definition 6.3: A context free grammar G = (V, X, P, t)(we use here the nota- 
tion t for the start symbol, £ € V — X, instead of ø which stands for an element 
of X) is deterministic linear (DL) if the productions in P satisfy the follow- 
ing requirements. 

(i) Each production has the form v > aču or the form v > b with v, Ce V — 
X;a,beX,u e X*; (ii) if two productions v, > a,x, v; > a;y, vi, v; € V—E, 
4,0, € È; x, y € fe) U (V — XE* are such that v, = v; and a, = a, then 
also x — y. 

A language L c X* is DL if it is generated by a DL grammar. 

Lemma 6.11: Every DL language is an E-event and therefore a PCE. 

Proof: Let L bea DL language generated by the grammar G = (V, X, P, t). 
Let A, be the GSM A, = (S, È, A, t, M, N) such that S = (V — Ð) U(f,d), 
f. d é V; ^ = X and the definitions of M and N are given by: 

č, if v > o€u € P for some u e X* 
M(v,o) —- 4f ify>oe Porifv—f 
d otherwise 


ù if v — oču € P for some u e X* 
Nv, e) = : 

e otherwise 
Let A,’ be another GSM A,’ = (S, È, A, t, M,N’) Thus A,’ differs from A, 
only in the function N' which is defined as 


Ne o) =|? ifv=f 


e otherwise 
Let Ag be the finite automaton 4g = (S, È, t, M, F) where F = {f} c S, and 
all the other elements are as in Ag. 

Let L, = T(4%, 0), L, is a regular event by definition. Let x = x,x; € Lı 
where x, is the subword of minimal length of x such that x, € L,. If x, = 
0,0; +--+ 0, then, by the definition of M,, the following productions are in P: 

t > 016,4, $i > 016115, sees iuo > 5 cue eni ex-177 Ox 
It follows that x = x,x, € L if and only if x, = wu, ,.u, 2;--- u,. In addition 
it follows from the definitions of N and N' that ‘P4°(x,x,) = Zi, -> d 
while V9 (x,x;) = x, Thus x € L, implies that x = x,x, € L if and only if 
Xa = Ú ++- Up- or V9 (x) = V44(x). Let f^ be the RPE induced by the m- 
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adic, m > 3, PA (see Exercise 3.1) with A(1) = A(a) and A(2) = A(b) and all 
the other symbols in the alphabet of A are deleted. 


Let f^ be the reverse of f4 which is an RPE by the costruction in the proof 
of Proposition 1.6. It follows from Theorem 6.10 that V46 o f4 ^. g,' and VP46'o 
f^ 5.g; are both RPEs. Moreover, it follows from the above considerations 
that if x € L,, then x € Lif and only if g,’ (x) = g;'(x). It is clear, from the 
definition of Ag that if x € L,, then x € L. Therefore, in order to complete 
the proof, one has to modify the functions g,’ and g,’ to g, and g, so that 
g(x) = g'(x) if x € Lp i = 1,2, but g(x) ~ g4x) if x ¢ L. Now L, is 
regular. Let h be the RPE such that A(x) = 1 if x € L, and A(x) = 0 other- 
wise. Set g, = g;/ V hand g, = g,' A h. By the construction in the proof of 
Proposition A.1.9, g, and g, are both RPEs. One verifies easily that g, and g; 
satisfy the above requirements and the proof is complete. 


Lemma 6.12: Let E, and E, be two DL languages over the alphabets X, 
and Z, respectively X, N E, = $. Let ô bea letter ô ¢ X, U È, Then EE, 
is an E event (and therefore a CPE). 

Proof: We shall prove the Theorem for €, = (a, b} and £, = (a', b']. The 
proof for the general case is similar. By Lemma 6.11, we can construct two 
RPEs for E,, g,, and g, such that x € E, if and only if g,(x) = g(x). Taking 
f^ in the proof of that lemma to be the RPE induced by the 9-adic PA will 
cause g,(x) and g,(x) to have the following properties: 

a. g(x) = 1 (if x € L, where L, is as in the proof of Lemma 6.11 for the 

given DL language, including the case x — e) 
or 

g(x) = .€,€,°-+ €, > 0 with e, = 1 or é, = 2 
in any case 0 < g(x) < 1. 

b. g(x) =0 (if x ¢ L, as above, including the case x = e) 

or 
g(x) =.€ tt- €, <1 withe,= 1 ore, =2 
In any case 0 < g(x) < 1. 

Similarly we can construct two RPEs for £,, g,' and g;' such that x € E, if 
and only if g,'(x) = g,'(x). We shall choose this time f^ to be the RPE induced 
by the 9-adic PA, but A(a’) = A(3) and A(b') = A(6). g,' and g;' will have 
the properties: 

a. g'(x) = lor g'(x) —.€,:::€,€,—30r€; —6 

0c g(x)x1 

b. g/(x)-O0org,(x') =. €t €, €, = 3 or €, = 6 

0x gy(x)cl 
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We shall extend first the functions g,g,, g,'g,’ to the functions h,, h,, hi’, h? 
such that the domain of the new functions will be (X U 2’ U ó)*, by adding 
to the underlying PA of each function unit matrices of due dimension for 
all symbols not included in the original domain of the specific function. 

We construct now the RPEs (Corollary A.1.7) i (h, + h,) — v, and 4h, + 
hj) = y, Finally, let y be the characteristic function of the regular event 
Z*óX'* = E (i.e., y(x)=1 if x € Eand y(x) — 0 otherwise), and set g — y, V XZ, 
g! =Y, ^ X. g and g' are RPEs [Proposition A.1.9] and g(x) = g'(x) if and 
only if x € E, ie, if and only if x has the form x = zóy, z € X*, y e X'* 
and w,(x) =w,(x), which happens if and only if h,(zdy) + h,(zóy) = 
h,(zdy) + h,(zdy). This is equivalent to g,(z) + g (y) = g;(z) + g (y) which 
is equivalent by properties (a), (b), (a’), and (b’) to the equations g,(z) = g,(z) 
and g,'(y) = g;(y) which hold true if and only if z € E, and y € E, Thus 
x € E,6E, if and only if g(x) = g'(x) and E,6E, is therefore an E-event. ff 

The lemmas proved above will be used now to prove some undecidability 
results for CPE. Some of the subsequent results and their proofs are similar to 
the ones used in the theory of context-free languages and will be omitted (see 
e.g., Ginsburg, 1966, Chapter 4). 

Lemma 6.13: Consider the following languages: 

For X = (a, b, c, a’, b', c'], if x € (a, b, c]*, let x’ be the word derived from 
x by replacing every occurrence of a letter in x by its primed counterpart, thus 
x’ € fa’, b, c']*. 

Define L, = Íxcydy'cX' : x, y € (a, b}*}. Let x = (x,..., x,) andy = 
(Yis - -s Yna) denote n-tuples of nonempty words in (a, b}*. Define 


L(x) = (a*b--- ad*bex,---x,:k >1, 1<ij<n} 
L(x, y) = L(x) al{y'), — y = (5 In’) 
All three languages, L,, L(x) and L(x, y) are E events for any given x and y 
Proof: L, is generated by the DL grammar 


G = ({t, €, a, b, c, d, a', b, c), {a,b,c,d,a', b.c), P, t) 


where P = {t — ata’, t > btb/, t — cfc’, € — a&a', č — bCb', č +d). There- 
fore, by Lemma 6.11, L, is an E-event. L(x) is generated by the DL grammar 
G = ((t, Šo ..., En, a, b, c}, {a, b, c}, P, t) where P = {t > a£, 6, a6, 6,9 
Way «^ 62-1 > Q6, G9 €, 61 Bg Xi 64 Dg Xn, ..., Èn > be, x,) and there- 
fore L(x) is also an E event. As for the language L(x, y) one can use the 
same proof as the one used for Lemma 6.12 with f^ replaced by /4 in the 
definition of the functions g,' and g,’ in order to show that L(x, y) is an E- 
event as well. 


Lemma 6.14: L(x, y) Ñ L, is an E-event for given x and y and it contains no 
infinite constext-free language. 
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Proof: E-events are closed under intersection (see Exercise 4.a.3). The second 
statement is known (see Ginsburg, 1966). i 


Lemma 6.15: Let t be the homomorphism t: {a, b, c, d, a’, b', c']* — (a, b}* 

defined by t(a) = ab, t(b) = a?b, t(c) = œb, t(d) = atb, t(a') = àó^b, 1(b') = 
ab, t(c') = a’b. Then t(L(x, y) O L,) for given x and y is a P-event and it 
contains no infinite context-free languages. 


Proof: One can easily construct a GSM mapping Y^: (a, b}* — {a, b, c, d, 
a'b'c']* such that ‘P4(x) = y if t(y) = x and V^(.«) = e otherwise. By Lemma 
6.14, L(x, y) O L, is an E-event and therefore (by Corrollary 6.4), a P-event 
of the form T(B, A4) with B an RPA. By Theorem 6.10, g = V^ o f? is an 
RPE.t For x € (a, b}*, if x = t(y) for some y then g(x) = f*(Y^(x)) = PO) 
so that g(x) > 4 if and only if x € t(L(x, y) A L,). Let y be the characteristic 
function of the regular event (ab, a*b,...,a’b}* and set g' =g ^ %:g' isan 
RPE having the property that t(L(x, y) Q L,) = t(g', A). The second state- 
ment of the lemma is well known (see Ginzburg, 1966). | 


Lemma 6.16: Each of the following is recursively unsolvable for arbitrary 
L(x, y) (a) whether L(x, y) Q L, is empty, (b) whether t[L(x, y) O L,] is 
empty where t is as in Lemma 6.15. 


Proof: This result is well known (see Ginzburg, 1966). | 


Theorem 6.17: Let X contain at least two elements. It is recursively unsolvable 
to determine for arbitrary P-events T(A, A) over X (a) whether T(A, A) is empty 
(b) whether T(4, 4) = Z* (c) whether T(A, 4) is regular and (d) whether 
T(A, A) is context free. 


Proof: By Lemma 6.15, «(L(x, y) O L,) is a P-event and it can be proved 
that it is either empty or infinite (see Ginzburg, 1966). Therefore Lemma 6.16 
implies. (a). X* — t(L(x, y) A L,) is a P-event (Theorem 6.3) which implies 
(b). Furthermore, (L(x, y) O L,) is regular (and therefore also context free) 
if and only if it is empty, by Lemma 6.15. Therefore Lemma 6.16 implies also 


(c) and (d). | 


Exercise 


By Lemma 6.15, t(L(x, y) O L.) can be represented in the form t(L(x, y) A 
L.) = (x : P(x) > A} for some RPE p. Let q be the RPE q(x) = + for all xe 
=*. Trove: p ^ q and p V q are RPEs if and only if t(L(x, y) Q L.) is empty. 
(This implies that it is recursively unsolvable to determine, for arbitrary RPE's 
p and q over X*, with |E] > 2 (1) whether p V q is an RPE, (2) whether 
p ^q is an RPE.) 


+An RPE is a PE g* such that the underlying PA, c, is an RPA. 
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EXERCISES 


1. A word function f is called quasidefinite if it has the following property: 
For any e, there exists an integer k(€) such that for any x with /(x) > k(e) the 
inequality | f(x) — f(y)| < € holds, where y is the k(e)-suffix of x. 
Prove that any quasidefinite function is €-approximable by a finite automaton 
[for any given e]. 
2. Let A = (a, S, (A(o)], 7") bea PA over È = (a, b] with S = {so, si], x = (10) 
= (t^) and the transition matrices are 
A(a) ee A(b) 
257 hoc sd u 
and let p^ be the PE defined by A. 
Prove that A is not quasidefinite but p^ is €-approximable for any € > 0 by 
a finite [deterministic] automaton. 


i 
b 
0j 


3. Let f be a word function and consider a relation P, induced by f over E*, 
as follows: xP,y if and only if for all z € X* | xz) — f(yz)| < € (thus xP,y 
implies that | f(x) — fQ) < ©). 

a. Prove that P, is symmetric reflexive and right invariant. 

b. Prove that any word function f which is €-approximable by a finite auto- 
maton induces a P,, relation of finite index. 

c. If the word function f is defined by a PA, then the relation P, is of finite 
index k with 


k«(14- 1) 
«(rz 
where n is the number of states of the PA defining f. 


4. Let f be a word function and 4 a cut-point. The relation R, induced by f 
and A is defined as follows: xR, y if and only if for any z € X* | xz) — 4| > € 
and | (yz) — 4| > e implies that f(xz) > A if and only if f(yz) > A. 


Prove: 

a. The relation P, defined in Exercise 3 above is a refinement of the relation 
R, here. 

b. If f is induced by a PA, then for any A and e > 0, R, is of finite index. 

c. If the event E = {x : f(x) > A} is e-approximable by a regular event then 
R, is of finite index. 


5. A cut-point event E = {x : f(x) > A}, where f is a word function, is quasi- 
definite if for any € there is an integer k(€) such that for any x with I(x) > k(e) 
and any y € E* we have that x € E implies that f(yx) >A — € and x ¢ E 
implies that f(yx) < å + €. 
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a. Prove that if f is a quasidefinite function [see Exercise 1] then the cut- 
point event E = (x : f(x) > A} is quasidefinite for any A. 

b. Prove that the converse of the above statement is not true by considering 
the function induced by the PA = (z, S, {A(o)}, ^) with z = (1 0) = (q^)! 


E = (a, b} and 
0 Nr 
Aer lo i| a B ij 


with E = T(A, 3). 


6. Prove that if f is a quasidefinite function [see Exercise 1] then any cut-point 
event of the form E = (x : f(x) > 4] is e-approximable by a regular event. 


7. Prove that any quasidefinite cut-point event [see Exercise 5] is €-approxi- 
mable by a regular event. 


OPEN PROBLEMS 


1. Characterize the word functions which are €-approximable by push down 
automata. 


2. Characterize the events which are €-approximable by context free languages. 
3. Is the class of PCE €-approximable by context free languages? 
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Chapter IV 


Applications 
and 
Generalizations 


INTRODUCTION 


This part contains an extended survey of most known papers dealing with 
applications and generalizations of probabilistic automata theory. 

There have been some attempts to apply the theory of probabilistics auto- 
mata to other disciplines. These attempts are however still in the beginning 
stages. We choose therefore to supply the reader with an extended bibliography 
including explanatory remarks as to the nature or direction of the intended 
application or generalizations. 


A. INFORMATION THEORY 


One of the motivations for studying probabilistic sequential machines [see e.g. 
Carlyle (1963a)] was the fact that communication channels (Shannon and 
Weaver, 1968) can be represented as stochastic sequential machines. The 
topics studied in connection with the theory of information using probabilistic 
machines are: probability structure of channels-Carlyle (1963a, b), Onicescu 
and Guiasu (1965), Thomasian (1963), Wolfowitz (1963); encoding and de- 
coding of finite state channels-Ott (1966a, b), Viterbi (1967), Guiasu (1968) 
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Viterbi and Odenwalder (1969). Other related references: Blackwell et al. 
(1958) Huffman (1952) Fano (1961) Shannon (1957), Paz (1965), Souza, 
et al. (1969). 


B. RELIABILITY 


When a deterministic automaton has some unreliable elements then its external 
behavior is probabilistic, thus, another motivation for studying probabilistic 
automata was the reliability problem. In connection with this aspect, the reader 
is referred to Von Neuman (1956) and Rabin (1963). An additional interesting 
reference can found in the book of Cowan and Winograd (1963). Many 
authors working in reliability theory have attempted to construct reliable net- 
works using unreliable components but the resulting network was always of 
the “definite” type. Cowan and Winograd showed that this is not a coincidence. 
They showed that, as a result of the axioms imposed on the network and the 
unreliable behavior of its components, the resulting probabilistic automaton 
satisfies the conditions of the Theorem 4.16 in Section IILB of Rabin and 
therefore the reliable network must be definite. Additional relatted bibliography : 
Arbib (1965) Harrison (1965) Tsertzvadze (1966) Germanov (1966). 


C. LEARNING THEORY AND PATTERN RECOGNITION 


Still another motivation for studying stochastic automata was the possibility of 
using them as models of learning and pattern recognition systems [e.g. Tsetslin 
(1961), Schreider (1962), Bruce and Fu (1963)]. The model used by Tsetslin 
consists of a deterministic automaton subject to a probabilistic training 
process. The input to the deterministic automaton is random and represents 
the reaction of a medium (“teacher”) to the performance of the automaton. 
Two inputs are possible, 1 (representing a penalty) and O (representing a 
nonpenalty) and the medium will insert its next input to the automaton in 
a random way, the probability of a penalty or nonpenalty depending on the 
present state. Let {s,}7., be the set of states of the (deterministic) automaton 
and let p, be the probability of receiving a penalty in state s. The auto- 
maton is called expedient if its expectation (in the long run) for receiving a 
penalty is less than the average of the ps. It is easy to see that the model 
corresponding to the above description is a probabilistic automaton with a 
single letter in the alphabet (the i-th row of its single transition matrix is the 
convex combination of the i-th rows of the transition matrices of the determin- 
istic automaton corresponding to the inputs 0 and 1 respectively, and the 
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coefficients of the combination are p, and 1 — p,). Tsetslin who initiated the 
study of expediency (as explained above) of deterministic automata in random 
media was followed by many authors who extended and generalized its ap- 
proach, allowing for changes in the transition probabilities induced by con- 
trolled learning, and using reinforcement algorithm: Bush and Mosteller (1955) 
Bruce and Fu (1963), Tsertsvadze (1963), Varshavskii and Vorontsova (1963), 
Fu and McMurtry (1965), Fu and McLaren (1965), Vorontzóva (1965), 
McMurtry and Fu (1966), and Fu and Wee (1967). Other related biblio- 
graphy: Suppes and Atkinson (1960), Braines and Svechinsky (1962), Krulee 
and Kuick (1964), Vaisborg and Rosenstein (1965), Sklansky (1966), Fu 
(1966, 1967), Wee and Fu (1969), Gelenbe (1969b). 


D. CONTROL 


It occured to several authors that control systems [e.g. Eaton and Zadeh 
(1962)] can be modelized by stochastic machines, with input symbols repre- 
senting commands, after some additional structure is added to take care of the 
costs associated with the transitions between the states. In this representation, 
a policy is a function associating commands to the states of the system and the 
policies are characterized by their expected costs. Some results in control 
theory, using this interpretation can be found in the works of Page (1965) and 
Arbib (1966). Other related bibliography: Zadeh (1963b); Screider (1962); 
Pospelov (1966), Kalman (1968), Kalman, et al. (1969). 


E. OTHER APPLICATIONS 


A connection between stochastic automata and the problem of time sharing in 
computer programming has been established by Kashiap (1966) and the theory 
of functions of Markov chains has been used by Fox and Rubin (1965) for 
statistical inference (for evaluating the cloud cover estimation of parameters 
and godness of fit based on Boston data). See also Lewis (1966). 


F. EXTENSIONS AND CONNECTIONS TO OTHER THEORIES 


Probabilistic extension of Turing machines have been studied by De Loeuw et 
al. (1956), Santos (1969), and Ellis (1969). Probabilistic extensions of context 
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free languages have been considered by Salomaa (1969-b) and Ellis (1969). 
Probabilistic extension of time variant machine has been studied by Turakainen 
(19692). Tree automata with a probabilistic structure have been studied by 
Magidor and Moran (1969), Paz (19682), and Ellis (1969). Some properties of 
fuzzy automata similar to properties of probabilistic automata have been 
established by Santos and Wee (1968) and by Mizimoto er al. (1969), an 
approach to stochastic automata and systems, from the point of view of the 
theory of categories can be found in the works of Heller (1967) and Depeyrot 
(1968) finally, some connections with dynamical programming has been 
established by Feichtinger (1968) [see also Howard (1960)]. Additional refer- 
ences: Wing and Demetrious (1964), Warfield (1968), Tou (1968), Li and Fu 
(1969). 
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Answers and 
Hints to 
Selected 
Exercises 


SECTION I, A.1 


l.a. 
A(v|u) = E A n(v|u) = 2 
8 8. L2. 
m(v|u) = ($4), — mv) = (44), — po à 
l.b. 


1 0 4 
Aoi) — | i (v|u) = H 


n(v|u) = (0, 0) z(v|u)—not defined, p,Av|u) = 0 = p.(aba|100) 
5.a. 9/16 
5.b. 87/128 


6. A(a|l) has negative entries. 
A(b|1) has entries bigger than 1. 
A(aj0) + A4(b|0) is not a stochastic matrix. 


210 


SECTION I, A2 


5.a. p,(abb|010) = 0. 
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5.b. q(a|011, bb) = 1/4m + 1/2n, where (m, n) is the initial distribution. 


5.c. r(a|1101) = 13m + $n. 
7. 


3 


1 1 0 2 
A(a) =|* i 10 =|, i Ia x 
e-i 1. w=] ;} "|, 


SECTION I, A.3 


2. 
[010 1 0 
AO)=3/0 O 1|+40 1 
100 0 0 
m10 0 0 0 
A(1)—80 1 O0O|+4#1 0 
0 0 1 1 0 
|Z| = 6 
ge f i $ 500 | 
0 £0 0 1 |! 
4. |w^^| = 10; [w] = 5. 
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— hn 


1 


cr N 


oor =- CO © 


A 
1 
3. 


= = O O O = 


© m 


Aw Ap 


3. Hint: Let M be a machine over an input alphabet ¥ with |X| = m — 1 
where m is the number of colums in the given matrix H, and output alphabet 
|Y| = 2, define the matrices A(y|x) as follows: A(y,|x;) has its first column 
equal to the ith column of H all its other entries being zero and A(y;|x;) has 
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its first column such that with all the entries in the other columns zero 
A(yilx) + A(y;lx)) is stochastic. 


SECTION I, B.2 


1. Reduced form: 
i 4 0 0 à ¢ 
Ayl)-2|$à 4 9. ÀA»l92|0 4 4 
w $0 0$ $ 


Minimal form: 


1 1 I 3 
Alil) = B | Alix) = E iil 
$ 312 16 16. 
(2; 0 0 4) is a distribution which is equivalent to the distribution 
G + & 9 
5. Given that f, = Ok, a,f;, $35.,a, = 1, a; > 0 we have that f(1 — a;) = 


Die af; ot f; = 3X4 [2;/(0 — a)]/; [since f; is not extremal 0< 1 — a; « 1] 
and 


cU Iq. 
fal — a; l, l—a, 
ll.a. Hint: Let F denote the given flat and let a, be an element of F; prove 
that the set L? = (a — a): a € F} is a linear space. 

b. Hint: Define the equivalence R over 7, zRp €» 2H™ = pH" where 
n, p € Pr. Show that R is right invariant and the set of equivalence classes 
is closed under convex combinations. Show also that each equivalence class is 
closed under convex combination of its elements. The rest of the proof is 
straightforward. 


SECTION I, B.3 


1. All the rows of H™ are different vertices of the |S|-cube and no vertex of 
the cube is a convex combination of other vertices. 


3. M* > M, M > M*. 


8. Let v, represent the ith row in H™, then the faces are: (v,, v2}, (v; v3}, {03 vi), 
{vs, vil, (vs, v}, (vs, v3}, fv, v4], (vi Vis vs}, fv, V4, vs}, (v. V35 vs}, (v;, 95, vst, 
[oi Vas V3, Va}, (01, V2. V3, Vs}. The faces containing two or three vertices above 
are simplexes. 
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11. H™ has two columns one of them having all its entries equal to 1; there- 
fore, there are two rows in H^ such that all the other rows are convexly de- 
pendent on those two rows. The theorem follows from Theorem 2.8 in the 
previous section. 


SECTION I, B.4 


3. M* > M, M M*. 


SECTION I, B.6 


2. Yes. 
3. 


1 
HM = ; 
1 
1 
3.b. The trivial machine M* with a single state and such that A*(0|0) = 


A*(0|1) = 4*(0|2) = 1, A*(1|0) = A*(1|1) = A*(1|2) = 0 satisfies the con- 
ditions. 


5. Proof: Let h,,..., hı be m linearly independent rows of H™. As rank 
H^ — m, all the other rows of H^ are linearly dependent on h,,,..., hi Let 
&(y|x) be a row in a matrix of M. There is a vector ¢’(y|x) having nonzero 
entries only in columns corresponding to the indexes i,,...,i,, [the entries in 
€'(y|x) may assume now negative values or values bigger than 1] and such that 
C y|x) H4 = €'(y|x)H™, since €(y|x)H™ is a vector which represents a linear 
combination of the rows of H™. 

Using an argument similar to the one used in the proof of Theorem 2.3 we 
see that the machine M’, defined as the machine derived from M by replacing 
all vectors €(y|x) in the matrices A(y|x) by the corresponding vectors ¢’(y|x), 
is state equivalent to M and all the columns in the matrices A'(y|x) corre- 
sponding to indexes other than i,,...,i,, are zero columns. The machine M’ 
can now be reduced to an equivalent m-state machine M" for only the states 
corresponding to the indexes ij, . . . , im are accessible in M’. 
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SECTION I, B.7 


l.a. 

TOUT 

+040 | ££ 

1100 1 4 0 

Gun — |? ? HY =| * 

1 14 2 QP j 3 3 

4 4 39 4 3 

$044 132 


2. Consider formula (28). The number of linearly independent rows in [K^] 
equals rank H“™” therefore there must be at least that many rows in K™* so 
that rank M* > rank H^, 


4. n* = (4,4 


i 0 1 0 
400) =| A æa =| | 


4 0 
0 £i 0 $ 
en =| 4 ^D 6 1 
5. 
fo 0 0 [i 4 0] 
A*(0O = |i i 0j, A*(1 2/0 0 0 
0 0 0 Lz 2 0 
[000 fi 2 0] 
A*(0) —2|i 2 0), A*(11)) 2|0 0 0 
4 2 0 10 0 OJ 
z*—[i 0 i] 
No further reduction is possible in this case. 
6. 
1 1 1 
H^ = Ep = HM» 
10 1 
100 


and there is no convex polygon inside the unit cube with less than four 
vertices in two dimensional plane which covers the unit cube [the rows of H™, 
ignoring the first coordinate]. 
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SECTION I, C2 


1. By definition p(4|A) = 1 and any compound sequence determinant of order 
2 is equal to zero. Thus 


pA) plu) 
p(v|u) p(vv'|uw) 
or p(vo'|uw) = p(v|u)p(v'|u). 
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| fo to Tv 
P=! , t? Q=]; i 
10 10. 4 20 


5 1 
— 3 T AM 0 — $ $ AM 1 poem 0 0 
4.b. It is easily seen that AM(v), for any v € Y* has the form 
a 00 


where a > 0 so that 7/4(v) = A™ (v) 4 isa column vector whose first entry has 
a positive and <1 value for any v € Y*. As the sum of the matrices A" (0) + 
AM (1) has row sums equal to 1, this is true also for the sums 

Z AM(v) so that 3 po)- 1. 

vi(v)-k vi(v)-k 

The required statement is proved now by induction on the length of v. For 
i(v) = 1, the proof results from straightforward computation. We consider now 
D, (lv) and p,” (w) for (v) = k. 


i 0 Ol[a(v) 
p, (1v) = nA) = [$ as tol] 9 0| b) 
2 0 Oljic(») 


I 
& 
~~ 
2 
= 
ej 
J- 
J- 
pen 
Aan c& wp 
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thus, 0 < p, (1v) < p,"(1) < 1. Considering now the value p, "(0v), we have 
that 

0 O1 a(v) 
—i 0 || bw) 

0 —ilicQ) 

= $ 2a(v) — dy áb(v) — d telo) 

It is clear from the definitions of the matrices that either both b(v) and c(v) are 
> 0 or both values are < 0. In the second case, p,“(Ov) > 0. In the first case 
p, (v) = 4a(v) — 4yb(v) — elw) > 0 by the induction hypothesis. There- 
fore, 


1 
2 
P” (w) = [$ dv ol] 
0 


$a(v) > oblo) + foco) 
or 
$ $a(v) > à tobo) + roc) > robe) + Toe) 
or 
p," (0v) > 0 


It follows that p, (0v) > 0 and p, (1v) > 0. But Y, p,"(0v) + p, (1v) = 1; 
and, therefore, both values are also <1. 


4.c. Hint: Use eigenvalue considerations. 


9. Hint: Use the nulity laws of Sylvester and the fact that the ranks of the 
spaces of the vectors z(v|u) and (vl) grow strictly when (v, u) grows or else 
the ranks do not grow any more. 


10. Hint: Use Exercise 9. 


SECTION II, A.1 


4.d. Let P = (p) be the matrix such that p, = 4 ifj —iorj —i-L- 2 and 
pi; = O otherwise. Show that Ó(P") = 1 for n = 1,2,... but lim,... d(P") = 0. 


4.e. d(P) — 0 implies that P is constant. 


6. Let E be the matrix all the rows of which are equal to some row of P then 
Q = P — E = P — RP where R is a matrix having a column of ones all the 
other columns being zero columns. Now use Corollary 1.5. 


11. Use induction. If n = 2, then 
l4; 4; — å All < l4: 4: — £14] + 114.45 — A, All = ||(4, — A241 

+ HAGA; — A)| LA. — Aillll ASIE + All — All < 2e 
by Exercise 10 and by the assumption. 
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12. ||P — &,J| = ||P — RP|| < 26(P) by Corollary 1.5 where R is a matrix 
whose i, column is a column of ones, the other columns being zero columns. 
Thus 6(P) > i||P — B,||. On the other hand 26(P) = sup, ||P — P,|| which 
proves the second part of the exercise. 

13.a. 6(P) = 0 implies that P is constant and an infinite constant matrix can- 
not be doubly stochastic. 


13.b. A matrix P is doubly stochastic if and only if A and A’ are both stochastic. 
If P and Q are doubly stochastic, then PQ is stochastic and (PQ)" = Q"P" isa 
stochastic matrix because Q7 and P" are stochastic. 

13.c. One proves easily that EP" = E thus ||P" — E|| = ||P" — EP"||< 
ó(P") — 0. 

13.d. If the statement is not true, then for some 0 < € < 1 there is n such 
that 6(P”) < € < 4. There is k such that for given io 375., pf) 1 — e. But 
for any iy, and i 


k 
IÈ pi — p c < € 
This implies that }0%_, pj) > 1 — 2e, i = 1, 2,... or 


o k o k 
SPPp=co but Mp =k, 

i= j=l i=1 j=1 

a contradiction. 


14. Use the fact that if Q = lim,.., P" then PQ = QP = Q. 


SECTION II, A.2 


3. 
aP, — zl| < |2Em — zli + Hm — zP.ll 
lH, — xP,]| = [IxH, s -; P, — AP 
= [KEH, a — PI < mH, — Tl 
Thus, 
\|2P, — z|| < |En — Fl] + lla Hna- — l| > 0 
4. 


Ae oe ai °°" P,P, | Pu n-1 °` P,P, || 
< O(P,-1P,-1 ME P,P) < O(P,-1)O(P,-1) E ó(P.) 0. 
This implies that lim,- P,P,P, ,P, , +++ P,P, = S exists [since the infinite 
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sequence of products is bounded]. That S is constant follows from the fact that 
ó(S) = lim, ... ó(P, P, P, , TUS P,P.) < lim, ... Ó(P, P, , ss P) m 0 


5. See Exercise 4.d in Section II, A.1. 
6. See Exercise 4.a in Section II, A.1. 
13. Hint: Use Exercise 4 in this section. 


SECTION I, A.4 


4. S= {s,,...,5,},8 0 = s. for i= 1,2,...,n—2 and s, Y = {Sn Si}, 
s,I = s, 

9. Hint: Show that at most 2" states can have a common consequent of order 
n. 

10. Let P be defined as follows: P = [p;;] with p,, = 1/n, p,, = 1 — (1/n), p; = 0 
otherwise. It is easy to see that y(P) = lim,.... p,, = 0, but the first column of 
P has all its entries different from zero. 

11. See Exercise 4.d in Section II, A.1. 

12. The matrices (1/7) Y7^,., P" are stochastic and any sequence of stochastic 
[therefore bounded] matrices has a convergent subsequence. It suffices to show 
that all the convergent subsequences have the same limit. Let n, m,..., n; 
be a subsequence of integers such that Q = lim,» (1/n;) 0%_, P" exists. Then 
QP = PQ = lim (1/n) Sot} P" and the two limits are equal, since they differ 
by the terms P”*'/n,, P/n; which tend to zero when n, — co. Similarly, for 
any n, Q = QP” = P"Q which implies that Q = QR = RQ for any limit R of 
another subsequence of averages of matrices. Using the same argument one 
finds that R = QR = RQ or R= Q. 

15. See Exercise 12 above. 

17. By Exercise 16, the equations 


Qu: xII—P]—-0; Yx=1 (*) 


have a unique solution. Thus det [J — P] = 0 and [J — P] has rank n — 1. 
The system of equations (*) can be shown to be equivalent to the system 


(x, +++ x) — [P— 0$] = 6, 


where 4 denotes a column vector with all its entries equal to 1 and £, is the 
rth row of P. Thus det [I — [P — €@,]] 4 0 and both parts of the exercise 
follow. 
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18. 
2 1 2 
$ 5 5$ 
2 1 2 
$ s 5 
2 1 2 
$ 5 $ 


21. Let Q = lim,... P". Then Q is constant and QP* = O for all k. Let 
z = (z;) be a row of Q, then 2y,* = z, Thus the vectors 1/,* all satisfy the 
equation of the (n — 1)-dimensional hyperplane z,x, + 7,x; + +++ + mx, 
= T, 


22a. 
i00 4 1100 
acs| 0e 01 xci E 
044 0 0 0 4 4 
$0014 0041 
b. 
YXA4B)-120, 7(A*) =x% = 0 
4 4 0 0 4 4 00 
gel eet , me. 579 
000 I 4400 
000 1 00 4 ¢ 


X4B)—0, 74)>0, 7(B*)> 0 


23. It follows from the assumption that PQ” ~ P. There is n such that Q” is 
scrambling so that also PQ” is scrambling and also P is scrambling. 


25. Let Q, = lim,.,.. A(x”), then 
\|A(yx) — Qll 
< ||A(yx) — Al + 1469 — Qll 
= ||A(y) A(x) — A(x)|| + 1469 — QA) 
< 26(A(x)) + 20(4()) 
which tends to zero with n. 


26. If P = [p] is a matrix of order n such that p; #0 for j = i and j = 
i+ 1 only then P satisfies H, of order n — 1, which is minimal. 


28. Use Exercise 4. 
29. Use Theorem 4.9. 
30. Use Exercise 4.9. 
31. Use Exercise 4.10. 
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SECTION 1I, A.5 


6. Hint: Use Exercise 5 and induction on the length of x. 
7. For any 2-state matrix P(o) = [P,,(a)], one can prove that 
u(o) w(c u(o) —u(c 
P(o) = | (0) wk 4 4 al (0) X 1 
u(c) ua) —u,(a) ua) 


where u,(o) = P,,(0)/(P,.(0) + P(o)), uo) = 1 — u,(o). Under the as- 
sumptions u,(o) and u,(o) are independent of o and therefore, 


u, uU; —u, 
P(o,0,) = l 1 | 4 AP» pel | 
ü, th —u, u 
since lim,.,., AX) AP)... AP) — 0, the limiting matrix is 


uu u 
u, Uz 


9. As in Exercise 8, we have that the limiting matrix is 


uu u ; u, —u 
+ lim zA* l 
u, h u, —u 


SECTION I, B.1 


6. Using the ordinary probability laws, resolve first the probability of the 
state of the whole system, given the present and past, into the probabilities of 
the next states of the separate systems A and B given the same, then use the 
Markov property of the two systems to eliminate the dependence on the past, 
and then combine back, proving that the resulting probability for the whole 
system depends only on its present situation. 


SECTION II, B.2 


l.b. The matrix VU has stochastic submatrices in its diagonal parts with 
rows and columns corresponding to the same block z, of z and, because of the 
lumpability condition, every column in the matrix A(c)V has all the entries 
corresponding to the same block z; equal one to the other. This implies that 
VUA(a)V = AloV. 


220 Answers and Hints to Selected Exercises 


1.c. Use the property 2.1.b proved above. 


l.d. As in 2.1l.b the matrix VU has stochastic submatrices in its diagonal 
parts with rows and columns corresponding to the same block x, of z, more- 
over the rows of those submatrices are equal. This fact together with the con- 
dition that VU A(c)V = A(o)V implies that all the entries in a column of 
A(o)V corresponding to the same block z, of z are equal one to the other 
which implies the lumpability condition. 


2. 
1 0 
(24100 10 
Up os | V= 
lo Oe 4 0 1 
0 1 
Á 05 0.5 ap [04 06 
?)-—|o5 07? (b) =| 075 025 


3. The system A is equivalent to the cascade product of the two systems 


B = (a, (B(o))) and C = (t, (CC, o) 


with 
"m r0.4 ei dus ls a 
[05 0.5 0.2 0.8 
and 
PENE r0.5 a ie: r0.8 0.27 
[0 1 |0.5 0.5] 
[1 0 0.4 0.6] 
iu cct D^ T €2,5-—|93 07] 


4. The system A is equivalent to the cascade product of the systems B = 
(T,(B(c)) and C = (T,(C(s,o))) with T, = {s,s,}, T; = {s,'s,} when the 
states (5,5,') and (sz, s") of the composite system are merged at the output and 
the transition matrices of B and C are defined as follows 


au [04 06 jp cons afi dn 
() —l045 0.25) (D—lo02 o8| Pa n o2 0.8] 
; 0.5 0.5 4 04 06) oefe 97] 
6&0 =lo75 0257 — CO» 9 —|og 04} $25 —|04 Q3 
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SECTION Il, C.1 


5. Let P, be a compound sequence matrix of maximalrank for f. The entries 
in P, are of the form f(v,ov;): Assume that there is some ô in v; and let 
v, = w,0o* where k, is some integer [including 0 in which case o? = A] and 
w; is a word. Then, f(v,0v,’) = f(w,d0*v,') = (f(w,0)/f(6))f(óo*av;) [by 
(22)] one can replace therefore the sequence v; by the sequence do* without 
affecting the nonsingularity of the compound sequence matrix [the factor 
f(w,6)/f(6) multiplies all the entries in the ith row of P and it is assumed that 
f(ó) = 0]. If there is no symbol 6 in v, then v; = o*. To complete the ex- 
ercise one uses a similar argument for the columns of P,. 


6. Let c, considered as a block of the partition $, over S contain the states 
Stis Sip + +6 9 Si where k(i) is the rank of c,. Consider the partition 5)’ which 
is the same as Y^, but the block c, of } is split into k(i) blocks containing the 
states s;, as their single elements. Let f’ be the function corresponding to the 
new partition >)’. Then f(vo;v') = Y*9! f'(vs;,v^). But the compound sequence 
matrices whose elements are of the form f'(vs;,v") are of rank 1 = r(s,,). 

7. As in Example 16 one can always find a nonsingular diagonal matrix X 
satisfying the condition 4’ = X; and then define A’ = XAX ^! n' = xX^! 
with z'; = nX XÑ = nij = land A'f! = A'Xn = XAG = Xj = if’. 

8. Let G, H, G', H’ be the G and H [see the proof of Theorem 1.12 for de- 
finitions) matrices corresponding to æ and æ’ respectively. æ and Æ’ being 
equivalent we have that GH = G'H' and GAH = G'A'H'. But rank f = |S| 
and therefore G and H are nonsingular so that A = G^'G'A'H'H^! and 
G^'G'H'H^-— GGHH = I. Let now B = G^G' and C = HH’. 

9. Prove that the conditions in Example 17 are satisfied for this case. 


12. Use Exercise 9 above. 


SECTION II, C.2 


7.c. It follows from Corollary 2.13 that XA(o) = A’(o)X for a nonsingular 
matrix X. If € is an eigenvalue for A(o) then A(a)é™ = e£? for some vector č 
and therefore A’(a)X¢" = XA(o)é" = eX& which proves that € is an eigen- 
value for A’(o) with eigenvector X¢7. Similarly, if € is an eigenvalue for A4'(c) 
with row eigenvector 4, then £X A(o) = €A’(a)X = e£X so that £X is a row 
eigenvector for A(a) with same eigenvalue. 
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8. Use the Sylvester inequalities for matrices. 


10. Let X be the unity matrix with an additional all zero row. Show that 
the conditions of Theorem 2.15 are satisfied for this matrix X under the con- 
ditions of the exercise. 


11. Let X be the matrix whose rows are all 2!5! vectors of dimension |S] 
with entries zero or one. Show that the conditions of Theorem 2.15 are satis- 
fied for this matrix X under the conditions of the exercise. 


12. See Exercise 2.7.c. 


13. Let ¢ be the maximal absolute value of the eigenvalues of a matrix AA’. 
It can be shown that ¢ satisfies the inequality (£A, £A) < |t|(€, č) where č is 
any row vector and (č, č) denotes the scalar product of € by č. Let X in 
Theorem 2.15 be the matrix with 2/5! rows, its rows being all possible |S|- 
dimensional vectors with entries either O or 1. Let č be a row of X, then 
(£A, £A) < (1/IS]X£, £) < 1, for (£, č) < [S|. This proves that the conditions 
of Theorem 2.15 are satisfied. 


SECTION III, A2 


l.h—iftig—itiíf-—s2 

5. Change the matrices A(c) into (|S| + 1) x (|S| + 1) matrices with first 
column an all zero column and first row of form (0, z), the remaining |S| x |S] 
diagonal submatrix of A'(c) being equal to the matrix A(oc). 

9. Express |f — g| in terms of the operations * V" and “^A” and use Pro- 
position 2.3. 


SECTION IH, B 


5. Use Exercise 1 in Section IH, A.2. 


7. Let u = 0, 0,u = 0, --- 0,, and define the following equivalence 
relation R: uRw' if and only if (1) z4(o, --- 0, ,)) = MA +++ 0,4); (2) 
A(Cy|o,) = A(y|o,,’) for all y e Y. R is right invariant, of finite index and 
P"(y|u) > A if and only if P^(y|u) > A. 


The reader will find additional hints and answers by consulting the biblio- 
graphical notes associated with each section. 
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